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ABSTRACT 


The  advent  of  the  digital  computer  has  given  a  new 
impetus  to  the  development  of  efficient  mathematical  methods 
and  computational  techniques  of  mathematical  programming, 
i.e.  the  problem  of  optimizing  a  constrained  multivariate 
function.  Dantzig,  in  19^7 ^  introduced  the  new  discipline 
of  Linear  Programming  and  the  Simplex  method  as  a  practical 
method  of  solving  the  linear  programming  problem.  Linear 
Programming  is  concerned  with  the  optimization  of  a  linear 
function  subject  to  linear  constraints  and  a  surprisingly 
large  class  of  industrial  and  business  problems  could  be 
formulated  as  such  an  optimization  problem.  A  great  interest 
remains,  however,  in  the  many  problems  which  do  not  real¬ 
istically  fit  into  the  framework  of  Linear  Programming;  this 
remaining  field  of  nonlinear  programming  -  the  problem  of 
optimizing  a  nonlinear  function  subject  to  linear  or  non¬ 
linear  constraints  -  is  both  of  practical  and  theoretical 
interest . 

Intimately  associated  with  the  mathematical  program¬ 
ming  problem  is  the  problem  of  solving  nonlinear  systems  - 
a  problem  relatively  neglected  in  the  literature.  Included 
in  this  thesis  along  with  a  broad  review  of  the  classical 
and  more  recent  mathematical  literature  in  linear  and  non¬ 
linear  algebraic  problems  are  the  numerical  and  computa¬ 
tional  aspects  of  the  algebraic  and  non-algebraic  nonlinear 
systems.  The  methods  considered  include  the  extension  of 
iterative  techniques  usually  applied  to  linear  systems  and 
the  more  important  "Gradient  Methods"  of  optimization.  The 


more  recent  results  which  are  critically  analysed  include 
the  concepts  of  approximation  of  a  nonlinear  function  in 
a  region  near  a  solution  by  a  positive  definite  quadratic 
function;  the  exploitation  of  the  special  properties  of 
such  a  function;  and  the  conjugation  of  successive  directed 
steps  towards  the  solution. 

Examination  of  the  available  literature  revealed: 

(i)  that  practical  methods  for  solving  nonlinear 
algebraic  systems  in  several  variables  are  not  available, 

(ii)  that  many  methods  are  available  for  special 
nonlinear  problems,  viz  quadratic  functions, 

(iii)  that  many  of  these  methods  have  been  in¬ 
adequately  tested  or  are  not  applicable  to  a  wide  range  of 
problems , 

(iv)  that  algorithms  for  digital  computers  have  only 
rarely  been  given,  and 

(v)  that  but  very  few  numerical  methods  can  be  made 
readily  accessible  to  the  practising  numerical  analyst. 

In  this  thesis  the  concentration  is  on  (iii),  (iv) 
and  (v)  i.e.  on  testing,  devising  algorithms,  and  the  more 
specific  presentation  of  the  available  mathematical  litera¬ 
ture  applicable  to  the  nonlinear  systems  and  nonlinear 
programming  problems . 
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NOTATION 


Throughout  this  thesis  use  is  made  of  matrix 
notation.  The  dimensions  of  a  matrix  or  vector  used  in  any 
section  are  shown  as  soon  as  possible  after  their  first  appear¬ 
ance  in  that  section.  A  matrix  or  vector  may  be  modified  by 
a  superscript  or  subscript  which  may  have  a  special  signific¬ 
ance  in  its  use. 
latrix  Notation 

a  matrix  composed  of  n  rows  and  m  columns. 


A 

x 


nxm 

nxl 


N. 


q 


a  set  of  n  variables  x-,  .xn,  .  .  .  .  x  which  de- 

12  n 

fine  a  point  or  n-dimensional  column  vector 
in  n  space. 

lxn  a  row  vector  obtained  by  transposition  of  a 
column  vector. 

nxq  the  subscript  here  gives  added  information  on 
the  size  of  the  matrix. 

I  :  nxn  a  special  matrix  frequently  used  defined  as  the 
identity  matrix  [5^.]  ,  where  =  1  ,  if,,  and 

only  if,  i  =  j. 

The  following  notation  is  used  in  the  description 
of  the  computing  algorithms  and  has  a  special  significance  in 
computer  applications  where  the  assumption  is  made  that  any 
quantity,  e.g.  a  matrix,  vector,  number,  etc.  may  be  referred 
to  by  a  symbol. 

Algorithmic  Notation 

P  the  quantity  a  is  set  equal  to  f3 

P  ;  a<-  (3  the  notation  P  ;  labels  the  associated  in¬ 
struction  as  p. 

-*  P  the  next  instruction  is  the  one  labelled  P. 


a  :  P  ,  l->  P  the  next  Instruction  Is  P  If  a  r  p  Is 

satisfied  where  r  Is  a  relation  such  as 
<  ,  /  ,  etc. 

N  ©  1  the  quantity  1  Is  concatenated  (designated 

by  the  ©  symbol)  with  N;  e.g.  If  N  :  mxn  , 
i  :  mxl  then  N  ©  1  :  (m+l)xn. 
g<s=5>Vf(x)  the  equivalence  symbol  is  used  to  con¬ 

dense  material  or  imply  a  whole  series  of 
operations . 


1 


CHAPTER  I 
INTRODUCTION 

The  advent  of  the  large  high-speed  electronic 
computer  and  the  desire  to  optimize  systems  have  given  a  new 
impetus  to  the  study  of  the  optimization  problem.  Interest 
in  this  problem  arose  in  economics ,  transportation,  and  manage¬ 
ment  sciences.  One  aim  of  this  thesis  is  to  review  and  analyse 
critically  the  classical  and  more  recent  contributions  to  the 
problems  of  solving  systems  of  nonlinear  equations  and  optimizing 
a  nonlinear  function  subject  to  contraints.  A  second  aim  is  to 
use  this  work  in  developing  practical  methods  for  high-speed 
computers . 


The  Introductory  chapter  sketches  In  the  historical 
background  to  the  solution  of  equations  and  systems  of  equations; 
and  reviews  the  growth  of  mathematical  programming:  finally, 
the  contents  of  each  chapter  are  summarized. 

§  1.1  Historical  Background 


At  the  outset  we  shall  attempt  to  place  the  nonlinear 
problem  in  its  proper  historical  context.  The  Egyptian,  Greek, 
Arabian,  and  later  the  medieval  mathematicians  were  restricted 
to  rational  operations  and  the  extraction  of  roots;  they  were 
able  to  solve  the  general  polynomial  of  degree  n  <  4.  The 
first  rigorous  proof  that  solution  by  radicals  would  be  in¬ 
sufficient  for  polynomials  of  degree  h  >  5  wasppublis.hed  in 
1824  by  Abel.  This  indicates  that  an  important  line  of  division 
lies  between  polynomial  equations  of  the  first  four  degrees  and 
those  of  higher  degree.  The  unification  of  algebra  and 
geometry,  i.e.  analytic  geometry,  can  be  attributed  to  Descartes 
"Geometrie"  in  1637.  A  deeper  insight  into  the  nature  and 
properties  of  an  algebraic  equation  was  given  by  the  theory  of 
substitutions  discovered  by  Lagrange  in  1770  and  finally  by 


. 


■ :  <  " 


. 


■  i 


-  2  - 

the  theory  of  groups  attributed  to  Galois ,  Cauchy,  et.  al. 
circe  1844-46.  Although  the  fundamental  theorem  of  algebra 
was  posed  at  least  two  centuries  earlier,  a  rigorous  proof 
was  not  available  until  Gauss  in  1799. 

It  is  known  that  a  special  case  of  a  system  of 
linear  equations  had  been  solved  by  the  early  Greeks  but  a 
satisfactory  treatment  of  linear  systems  was  not  possible 
until  determinants  had  been  developed.  This  is  a  comparatively 
recent  topic  in  algebra,  having  its  origin  in  the  writings  of 
Leibnitz  at  the  end  of  the  seventeenth  century,  and  assuming 
a  significant  position  in  mathematical  literature  during  the 
latter  part  of  the  eighteenth  and  first  part  of  the  nineteenth 
century.  Sylvester's  dialytic  method  of  elimination  is  a 
special  application  of  the  theory  of  determinants  to  the  non¬ 
linear  algebraic  system.  This  method  converts  the  nonlinear 
system  into  a  problem  of  a  polynomial  equation  in  one  variable . 
The  dialytic  method  is  of  limited  practical  value  and  is 
restricted  to  systems  of  low  order  and  low  degree;  e.g., 
the  second  order  algebraic  system  fn(x,y)  =  0,  f  (x,y)  =  0,  of 
total  degree  n,  m,  respectively,  has  a  resultant  polynomial  of 
degree  <  mn.  Another  important  application  of  the  theory  of 
determinants  is  the  solution  of  a  nonlinear  system  by  iterative 
techniques . 

The  numerical  solution  of  algebraic  equations  remain¬ 
ed  a  very  difficult  problem  even  after  the  explicit  expression 
for  the  roots  of  the  polynomial  was  known.  This  computational 
difficulty  led  to  a  search  for  practical  and  particular 


, 

' 
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methods  as  opposed  to  general  methods.  For  example. 

Cardan's  solution  of  the  cubic  requires  the  evaluation  of 
the  following  expression: 

(-g+k  /-l) ~>+(-g-k  /TI)3 
— 2 

Now  there  is  no  general  arithmetical  process  for  extracting 
the  cube  root  of  such  complex  numbers,  and  consequently  this 
formula  is  useless  for  purposes  of  arithmetical  calculation. 

On  an  electronic  computer  the  solving  of  the  cubic  equations 
may  be  handled  by  the  unsophisticated  but  simple  method  of 
bisections.  This  serves  as  a  good  example  of  the  distinction 
between  the  analytic  and  practical  viewpoint. 

We  may  observe  that  for  many  applications  only 
approximate  values  of  the  real  roots  are  needed.  We  will 
assume  for  the  sake  of  brevity  that  we  need  consider  only 
equations  with  real  coefficients.  Vieta's  method  of  obtaining 
the  approximate  real  roots  of  a  polynomial  was  published  in 
1600  and  finally  perfected  as  Horner's  method  in  1821.  Sturm's 
theorem  (proved  in  1829)  furnishes  the  scientific  foundation 
for  every  method  of  finding  the  approximate  values  of  the  roots 
in  an  algebraic  equation  with  real  coefficients;  it  determines 
the  number  of  real  roots  between  two  arbitrarily  assigned 
numbers.  In  ±j6j,  Lagrange  published  a  theoretically  simple 
method  for  finding  the  approximate  value  of  an  irrational 
root  by  means  of  continued  fractions  where  the  required  informa¬ 
tion  is  knowledge  of  the  existence  of  a  real  root  within  an 
interval.  Although  this  method  is  perspicuous  and  exhibits 
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clearly  the  reason  for  each  step,  it  has  not  been  used  as 
widely  as  the  well-known  Horner's  method. 

A  useful  technique  in  solving  certain  algebraic 
problems  is  by  graphic  methods.  These  methods  were  developed 
mostly  since  the  beginning  of  the  nineteenth  century  and  they 
serve  either  as  a  rough  check  on  the  accuracy  of  calculations 
or  as  a  sufficiently  accurate  solution  for  the  problem  in  hand. 
The  literature  on  graphic  algebra  is  very  extensive  and  among 
the  introductory  treatises  we  may  mention  the  Graphic  Algebra 
by  Phillips  and  Beebe. 

The  solution  of  transcendental  functions  poses  its 
own  special  problems  since  we  have  to  consider  the  possibility 
of  an  infinite  number  of  roots  or  no  roots  whatsoever.  For 
example,  the  function  cos  x  has  an  infinite  number  of  real 
roots  (x=+  k ir+T^,  k=0, 1,  .  .  .  ),  but  ex  has  neither  real  nor 
complex  roots.  Some  of  the  methods  available  for  finding  the 
roots  of  polynomials  can  be  extended  to  solve  the  transcendental 
problems . 


The  problem  of  solving  systems  of  nonlinear  non- 
algebraic  equations  has  been  relatively  neglected  in  the 
mathematical  literature.  The  method  of  iteration  can  quite 
often  be  successfully  applied  to  the  solution  of  the  nonlinear 
system.  A  recent  approach  to  this  problem  is  to  solve  for  the 
minimum  of  an  associated  test  function.  The  methods  usually 
employed  are  known  as  gradient  methods.  The  solution  of  the 


nonlinear  system  is  much  more  complicated  than  the  solution 
of  one  equation  in  one  unknown.  The  basic  difficulty  is 
the  lack  of  effective  methods  for  the  finding  of  initial 
approximate  solutions.  Newton's  method.  Simple,  and  Gauss  - 
Seidel  are  usually  effective  for  making  solutions  more  accur¬ 
ate  . 


§1.2  Mathematical  Programming 

Mathematical  programming  is  concerned  with  the 
problem  of  maximizing  or  minimizing  a  function  of  several 
variables  that  are  restricted  by  a  number  of  constraints.  The 
problem  of  optimizing  such  a  function  restricted  to  equality 
constraints  is  intimately  associated  with  the  general  optimiza¬ 
tion  problem.  The  classical  technique  applied  to  the  equality 
constrained  optimization  problem  is  Lagrange's  method  of 
Undetermined  Multipliers.  Although  some  success  has  been 
achieved  in  adapting  Lagrange's  method  to  modern  programming 
problems,  the  trend  in  the  last  few  years  has  been  to  evolve 
gradient  and  conjugate  gradient  methods  as  the  most  efficient 
methods  of  computational  attack. 

The  theoretical  and  computing  aspect  of  the 
optimization  problem  subject  to  constraints  has  received 
renewed  attention  in  the  last  two  decades  and  given  rise  to 
the  new  disciplines  of  linear  and  nonlinear  programming.  In 
19^7,  Dantzig  in  cooperation  with  Wood  was  concerned  with  an 
analysis  of  military  programming  and  he  proposed  that  the 
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Interrelationships  between  the  activities  of  a  large  organiza¬ 
tion  should  be  viewed  as  a  model  of  Linear  Programming  in  which 
the  best  program  is  determined  by  minimizing  a  linear  function 
subject  to  linear  constraints.  That  this  concept  of  linear 
programming  could  be  used  in  other  contexts  was  soon  recogniz¬ 
ed  and  many  mathematicians  became  interested  in  the  theoretical 
and  computational  aspects  of  this  field.  The  fundamental 
literature  in  this  field  may  be  found  in  Koopmans  (1951)  which 
includes  the  work  of  Dantzig  (1951  a,  b)-  the  orginator  of  the 
basic  principles  of  Linear  Programming.  In  many  problems  a 
far  more  realistic  formulation  is  possible  if  a  nonlinear  model 
is  used.  In  such  cases  the  optimum  solution  obtained  by  a 
nonlinear  programming  method  should  have  considerably  more 
validity  than  the  solution  to  the  approximate  linear  programm¬ 
ing  problem.  Examples  of  two  possible  new  areas,  of  application 
are  optimum  design  of  process  equipment ,  and  the  direct 
solution  of  problems  in  mechanics,  physics  and  chemistry  which 
require  that  the  energy  (given  as  a  function  of  the  system 
variables)  be  minimized  subject  to  one  or  more  subsidiary 
conditions,  Rosen  (i960) .  The  most  recent  work  in  the  field 
of  nonlinear  programming  is  reviewed  in  Spang  (1962),  Turner 
(i960),  Riley  and  Gass  (1958),  Wolfe  (1962),  and  Zoutendijk 
(I960) . 

§1.3  Survey  of  the  Thesis 


The  subject  matter  of  this  thesis  is  shown  in  detail 


in  the  "table  of  contents"  but,  nevertheless,  it  will  not  be 


. 
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superfluous  to  make  a  short  survey  of  each  chapter.  The  first 
chapter  consists  of  the  introduction. 

Chapter  II  reviews  the  more  pertinent  classical 
theorems ,  definitions,  and  techniques  concerned  with  linear 
and  nonlinear  algebraic  equations  and  systems  of  equations. 
Numerical  solutions  and  iterative  techniques  which  may  be 
applied  to  these  algebraic  problems  are  also  examined. 
Graphical  methods  -  which  had  a  considerable  voque  before 
digital  computers  became  available  and  are  still  sometimes 
useful  in  analogue  computers  -  are  considered  as  are  problems 
concerned  with  curve  tracing.  In  preparation  for  the  more 
difficult  nonlinear  problems,  we  review  the  classical  problems 
of  solving  the  linear  system  of  n  equations  in  n  unknowns. 

The  particular  methods  reviewed  include  the  Gauss  Elimination 
technique.  Simple  iteration,  and  Gauss-Seidel  iteration. 

In  Chapter  III  we  review  the  problem  of  solving 
systems  of.  nonlinear  algebraic  and  non-algebraic  equations. 

The  question  of  existence  of  solutions  for  the  algebraic 
system  is  satisfactorily  answered  by  converting  the  nonlinear 
algebraic  system  into  a  polynomial  in  a  single  variable  by 
the  dialytic  method  of  elimination.  The  computational  con¬ 
siderations  of  the  dialytic  method  suggest  that  this  method 
is  in  general  unsuitable  as  the  basis  for  a  numerical 
solution.  Numerical  .  methods  which  may  be  effectively  applied 
to  the  nonlinear  problem  include  the  extensions  of  Simple  and 
Gauss-Seidel  iteration  methods  and  the  use  of  gradient  or 
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optimization  techniques.  A  more  recent  approach  is  the 
almost  experimental  method  of  direct  search. 

The  basic  concepts  and  elementary  techniques  of 
Linear  Programming  are  outlined  in  Chapter  IV.  This  review 
attempts  to  include  the  more  fundamental  theorems  and 
methods  in  this  new  discipline.  We  review  the  basic  Simplex 
and  Dual  Simplex  methods  but  do  not  attempt  to  extend  or 
evaluate  the  other  methods  currently  available. 

In  Chapter  V  we  review  the  more  difficult  mathemati¬ 
cal  programming  problem  -  nonlinear  programming.  A  nonlinear 
programming  problem  is  obtained  when  the  linearity  restriction 
is  removed  from  the  objective  function  and  constraints.  We 
examine  In  some  detail  a  special  case  of . nonlinear  programming 
defined  as  quadratic  programming  -  the  problem  of  minimizing 
a  convex  quadratic  function  (or  its  dual),  subject  to  linear 
constraints  -  in  great  detail  emphasizing  the  importance  of 
gradient  methods. 

In  the  last  Chapter  we  attempt  to  summarize  the 
conclusions  applicable  to  the  nonlinear  problem.  A  comparison 
of  methods  on  any  particular  problem  or  group  of  problems  Is 
not  very  helpful  since  there  will  always  be  some  particular 
function  for  which  a  given  method  is  best  suited.  We  are 
therefore  left  with  the  exercise  of  experience  and  intuition 
in  order  to  fit  the  appropriate  method  to  the  particular 
problem. 
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The  organization  of  the  succeeding  Chapters  is  as 
follows:  an  introduction;  definitions,  notation,  and  theorems; 

methods  and  examples  where  applicable.  The  computing  algorithm 
-  a  device  which  describes  the  numerical  calculational  proced¬ 
ure  -  is  freely  used  to  challenge  the  interest  of  the  reader 
and  yet  present  the  numerical  calculation  in  a  simple  and 
precise  manner. 
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CHAPTER  II 

CLASSICAL  ANALYTIC  AND  ITERATIVE  METHODS  FOR  SOLVING 
LINEAR  AND  NONLINEAR  ALGEBRAIC  PROBLEMS 

§2,1  Introduction 

The  roots  of  an  arbitrary  polynomial  in  one 
variable  Pn(x)=o  can  be  given  in  explicit  form  for  n  <  4. 

For  n  >  5  the  explicit  formulae  are  in  general  expressed  in 
terms  of  hyperelliptic  functions;  they  are  often  complicated 
and  unwieldy  and  are  rarely  used  as  the  basis  of  a  numerical 
method  of  solution.  Classical  methods  are  available  for  the 
numerical  solutions  of  the  linear  algebraic  problem 
Ax=b  (  A:nxn  matrix;  x,b:nxl  vectors)  and  the  polynomial 
equation  in  one  variable  Pn(x)=o;  and  some  results  are  avail¬ 
able  for  the  more  complicated  set  of  nonlinear  algebraic 
equations  in  n  variables  F(x)=o  (F(x),x:nxl  vectors).  Questions 
of  uniqueness  and  existence  of  solutions  for  these  algebraic 
problems  have  been  extensively  considered  in  the  literature 
which  is  scattered  and  incomplete. 

In  this  chapter  the  more  fundamental  and  well-known 
together  with  some  less-known  but  useful  results  in  this  field 
are  reviewed  and  their  pertinence  to  this  thesis  is  briefly 
discussed  in  the  succeeding  chapters. 

The  important  questions  arising  from  ill-conditioning 
(i.e.  polynomials  whose  zeros  are  oversensitive  to  small 
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changes  in  the  coefficients)  is  too  large  a  topic  to  be 
included  within  the  scope  of  this  thesis.  Wilkinson  (1959) 
has  shown  that  very  useful  results  can  be  obtained  for  the 
polynomial  case  by  quite  simple  analysis.  We  discuss  his 
work  briefly  but  do  not  attempt  to  discuss  the  extensions 
of  his  work  to  the  very  difficult  problems  of  ill-conditioning 
for  nonlinear  algebraic  problems  in  many  unknowns. 

Once  again*  it  may  be  emphasized  that  the  classical 
techniques  considered  in  this  chapter  are  not  always  good 
techniques  for  electronic  computers. 

The  principle  of  iteration  was  used  as  early  as 
1674  for  the  determination  of  square  roots.  This  principle 
has  since  then  been  extended  to  finding  the  roots  of  poly¬ 
nomials*  solving  systems  of  equations*  etc,*  and  is  becoming 
increasingly  important  as  an  approach  for  the  numerical 
solution  of  algebraic  and  transcendental  equations.  We  study 
some  of  the  results  in  this  field. 

§2.2  Definitions  and  Notation 

§  2.2.01  We  shall  use  Pn(x)  to  denote  a  polynomial  in  x  of 

degree  n  expressed  in  the  normal  form 

Pn  W  =  a0xn+  a1xn_1+  ...  +  an_1x+aQ*  aQ/o,  ^ 

n 

where  aQ  is  the  coefficient  of  x  and  the  polynomial  is  ordered 
in  descending  powers  of  x. 
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$  2.2.02  The  previous  notation  can  be  readily  generalized 
to  a  polynomial  In  n  variables  x^,  xn,  xn  and  expressed 

In  the  normal  form 

f(x1,xOJ111JxJ  =  g  ( Xq )  }  x  ) x^  +  . 


+ 


X1  +  sm(v---’xf 


(xOJ . . . ,xn) 
(2) 


where  m  is  the  highest  degree  of  any  term  containing  x^ 
and  the  g(x)'s  are  polynomials  in  the  n-1  variables 
(Xgj . . . jXn) .  We  may  Interchange  the  role  of  x^,  1  /  1,  with 
x^  in  (2)  and  obtain  a  similar  expression  in  x^.  The  highest 
degree  in  x^  of  any  term  in  the  polynomial  in  the  form  (2) 
whose  coefficient  is  not  zero  is  called  the  degree  of  the 
polynomial  in  x^,  and  the  highest  total  degree  of  any  term 
whose  coefficient  is  not  zero  is  called  the  degree  of  the 
polynomial .  The  particular  polynomial  equation  in  n 
variables  where  m  is  the  maximum  sum  of  the  exponents  of  any 
one  term.,  i.e.  the  polynomial  of  total  degree  m,  is  denoted  by 

fm(Xl'X2J • ' • ’xn)  =  °' 

§  2,2,03  The  n  equations  of  the  first  degree  in  n  unknowns 

allxl  +  a12x2  +  • * *  +  alnxn  =  bl 

(3) 


a21Xl  + 


+  a0  x  =  b„ 
2n  n  2 


anlxl  +  an2x2  + 


+  a  x  =  b 
nn  n  n 


where  a's  are  any  constant,,  real  or  complex,  is  defined 
as  a  system  of  n  linear  simultaneous  algebraic  equations  in 
n  unknowns.  The  system  of  equations  (3)  may  be  denoted  more 
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briefly  by 


—  b^  i  — •  1  ^  ■  ■  •  j  n ,  j  - 1. ,  ... }  n } 

or  in  matrix  form  as 

Ax  =  bj  where  A  Is  an  nxn  matrix;  b,x  ;  nxl  column  vectors. 


§  2.2.04  We  can  extend  the  notation  for  linear  systems  to 
nonlinear  systems  as 


f1(Xrx2,...,xn)  =  b1 
f2(xvxz,...,xn)  =  b2 


fn(sl'V""sn)  =  bn  or  as 

F(x)  =  b,  where  F(x),  Xj  and  b*  are  nxl 

column  vectors. 

§  2.2.03  A  Sturmian  sequence  is  a  sequence  of  n+1  functions 
fQ(x),  f^(x) f  (x)  which  satisfy  on  a  given  interval 
[ a, b ]  of  the  real  axis  the  following  conditions; 

1.  f^(x)  are  continuous  functions  (i=sO,  1,  .  .  .  ,n)  . 

2.  Sign  f  (x)  =  constant  for  xe[a,b] . 

3.  If  f1(x)  =  0,f1+1(x)  and  fi_1(x)^0  for  xe[a,b] 
and  all  i. 


4.  If  f±(x)  =  0,  sign  f1_1(x)  =  sign  f1+1(x) 

(i^lj • • . ,n-l) . 

5.  If  xn  =  x  is  a  root  of  f  (x),  then  for  h 

1  o 

sufficiently  smalls 

sign  f  (x-h)  =  -1,  sign  fQ(x+h)  =  1  Todd  (1962). 
f ^ (x-h)  f ^ (x+h) 
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§  2.3  Polynomials  In  One  Variable 

The  following  review  of  the  classical  results  for 
polynomials  in  one  variable  is  certainly  desirable  before 
we  consider  the  more  difficult  nonlinear  algebraic  problem. 

A  more  extensive  treatment  may  be  found  in  Burnside  and 
Panton  (1912),  Bocher  (1922),  and  Turnbull  (1944). 

§  2.31  Theorems,  Lemmas,  and  Algorithms 

§  2. 31. 01  Fundamental  Theorem  of  Algebra 

Every  polynomial  equation  Pn(x)  of  degree  n  (n>l) 
has  at  least  one  root. 

Corollary: 

Every  polynomial  equation  PR(x)  of  degree  n  (n>l) 

has  exactly  n  roots. 

§  2.31.02  Factor  Theorem 

If  x.  is  a  root  of  P  (x)=0  then  x-x.  is  a  factor 
1  n'  '  1 

of  Pn(x);  i.e.  PR(x)  =  (x-x±)  Pn_1(x) 

§  2.31.03  Remainder  Theorem 

If  a  polynomial  P  (x)  is  divided  by  the  linear 
function  (x-x^) ,  where  x^  is  any  constant,  the  remainder  R  is 
defined  by  R  =  Pn(x^) . 

§  2.31.04  Descartes1  Rule  of  Signs 

The  polynomial  equation  Pn(x)=0  cannot  have  more 

positive  real  roots  than  there  are  variations  in  sign  of 

successive  coefficients  of  PR(x)  or  more  negative  roots  than 

there  are  variations  in  sign  of  P  (-x). 

o  n 
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§  2.31.05  Theorem 

If  the  polynomial  P  (x)  is  evaluated  at  x=a  and 

x=b,  where  a<b  and  both  a  and  b  real,  and  If  sign  Pn(a)/ 

sign  P  (b)  then  there  exists  at  least  one  x=x  such  that 

p  (x^)=0  where  a<x  <b. 
n v  o'  o 

§  2.31.06  Sturm's  Theorem 

Although  the  Descartes  Rule  of  Signs  is  remarkable, 

an  uncertainty  as  to  the  exact  number  of  real  roots  in  an 

equation  still  remains.  The  problem  of  finding  a  definitive 

test  was  finally  soved  in  1829  by  Sturm.  Sturm  showed  how 

to  find  for  any  polynomial  equation  the  exact  number  of 

real  roots  which  lie  within  any  given  range  of  values  using 

only  rational  operations. 

Theorem:  There  exists  a  set  of  real  polynomials  Pn(x), 

P^  (x),  f ^ (x) , . . . , f  (x)  whose  degrees  are  in  descending  order, 
such  that  if  b>a,  the  number  of  distinct  real  roots  of  P  (x)=0 
between  x=a  and  x=b  is  equal  to  the  excess  of  the  number  of 
changes  of  sign  in  the  sequence  Pn(x),  P^x),  f2(x) , . . . , fn(x) 
when  x=a  over  the  number  of  changes  of  sign  when  x=b. 
Corollary:  If  f  (x)  is  a  remainder  involving  x  and  such  that 
f  (x)  remains  positive,  or  remains  negative,  for  all  real 
values  of  x  between  a  and  b,  then  the  sequence  need  not  be 
prolonged  beyond  f  (x) . 

§  2.31.07  Examples  of  Sturm's  Theorem 

Example  1:  Find  the  number  and  situation  of  the  real  roots 


of  the  polynomial  equation 
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Po(x)  =  x^-7x+7  =  0. 

Now  p'(x)  =  3x2-7, 

f2(x)  =  P3(x)-q1P^(x)  =  2x-3, 
f3(x)  =  P3(x)-q2f2(x)  =  1. 
Hence  we  construct  the  following  table: 


X 

p3(X) 

p3(x) 

f2(x) 

f3(x) 

-00 

_ 

+ 

— 

+ 

-4 

- 

+ 

- 

+ 

-3 

+ 

+ 

- 

+ 

+1 

+ 

- 

- 

+ 

+2 

+ 

+ 

+ 

+ 

00 

+ 

+ 

+ 

+ 

By  inspection,,  we  see  that  one  real  root  lies  in  [-4,-3] 
and  two  real  roots  are  in  [1,2]. 

Example  2:  Find  the  nature  of  the  roots  of 

P4 ( x)  =  x^-5x^+9x2-7x+2  =  0. 

Now  P^(x)  =  4x^-15x2+l8x-7 * 

f  2  ( x)  =  Pi|(x)-q1Pj[(x)  =  x2-2x+1  =  (x-l)2, 

f 3 ( x)  =  P^(x)-q2f2(x)  =  0. 

Hence 


X 

P4M 

P4U)  f2(x) 

-00 

+ 

+ 

+00 

+ 

+  + 
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We  observe  that  the  given  equation  has  only  two 
distinct  real  roots;  but  one  of  these  Is  a  triple  root,  as  Is 
evident  from  the  form  of  f^x)  =  (x-l)^. 

§  2.31.08  Euclid's  Algorithm 

From  the  Remainder  Theorem  we  have 
pnU)  =  pn_i (x)  (x-h)  +  R,  R  =  pn(h)  . 

A  very  useful  and  powerful  method  of  describing  a  numerical 
computation  is  by  the  computing  algorithm;  this  technique  is 
used  extensively  throughout  this  thesis.  We  choose  a 
compact  symbolic  notation  which  is  almost  self-explanatory. 
The  use  of  this  notation  -  outlined  in  the  notation  table  - 
enables  us  to  circumvent  long  periphrastic  explanations. 

The  computing  algorithm  using  the  symbolic  notation  is  now 
used  to  describe  the  well-known  Euclid's  Algorithm  for 
determining  the  remainder  when  a  polynomial  is  divided  by 
a  linear  factor. 

Let  n,  h,  a:  (n-fl)xl  define  the  problem;  i.e. 

Pn(x)  =  a0xn  +  ajx"-1  +  ...  +  an. 

The  remainder  R  =  P  (h)  is  then  given  by: 

k  <-  0 

1 ;  k  <—  k  +  1 

ak-  ak +  ak-ixh 

n:  k  >1 


END. 
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e.g.  P2|(x)  = 


3x^  -  5x^  +  10x^  +  llx  -  6l;  h 


k^  0 
k*—  1 


a1  4-  -5  +  3x3  =  4 


=  3,  n  =  4, 


1  <  4 

k<-  2 


a2  10  +  4x3  =  22 


2  <  4 

k<-  3 


a^^-  11  +  22x3  =  77 


3  <  4 
k4-  4 

a^+-  -61  +  77x3  =  170 

4  {  4 

R  <—  a^  —  170 
END. 


§  2.31.09  Algorithm  for  Increasing  a  Root  by  h 

This  problem  is  essentially  equivalent  to  Horner's 
method  of  finding  a  numerical  solution  of  a  polynomial. 
Hence,  given  n,  h;  a  :  (n+l)  xl,  the  resultant  polynomial  is 

given  by  the  following: 


. 

r  > 

+  I 

•  '  l  \  M  J.! 
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conditional 

branch  comment 


i  <- 

n 

1;  k  «- 

0 

2;  k  «- 

k+1 

ak^ak+ak-ixh 

i  : 

k  ,  ^  2 

is 

stage  1 

complete 

j0  <- 

i- 1 

i  : 

1  ,  £l 

is 

stage  2 

complete 

END 

5  0  4 

x^  -  3x  - 

24x^  +  95x^  -  46x  - 

101 

)  h  =  1, 

n  =  5 

l  5 
k<-  0 
k  1 

a£~  -3  +  lxl  =  -2 

1  <  5 

k  2 

a^  -24  +  -2x1  =  -26 

2  <  5 

k  ^  3 

a^  26  +  95x1  =  69 

3  <  5 
k  4 

aif~  69  +  -46x1  =  23 


4  <  5 
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-101x1  =  -78 
stage  1  Is  complete 


lxl  =  -1 


a^  -26  +  -lxl  =  -27 

2  <  4 
k .  -  3 

a^  *-  69  +  -27x1  =  42 

3  <  4 

k  «-  4 

a^  23  +  42x1  =  _65 

4  =  4  ,  stage  1  is  complete 

i  *-  3 

1  <  3 

k  *-  0 

k  +-  1 

a^  -1  +  lxl  =  0 

1  <  3 

k  «-  2 

a2  «-  -27  +  Oxl  =  -27 

2  <  3 


a5  23  + 
5  =  5, 

i  4 
1  <  4 

k  0 
k  «-  1 

a1  +-  -  2  + 
1  <  4 

k  +-  2 


:  .? :  I- 

■■i-  1 I  o^,sc/s  ,  ..  - 
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k  3 

a ,  «-  42  +  -27x1  =  15 
3  =  3  j  stage  one  is  complete 

i  <-  2: 

1  <  2 
k  «-  0 
k  <-  1 

a^  0  +  lxl  =  1 
1  <  2 
k  <-  2 

a2  <-  -27  +  lxl  =  -26 
2=2,  stage  one  is  complete 

1  «-  1 
1  £=.  1 
k  <-  0 
k  <-  1 

a  <-  1  +  lxl  =  2 

2  >  1  ,  stage  one  is  complete 

i  <-  0 

1  >  0  ,  stage  two  is  complete 

END. 

Our  resultant  polynomial  is  given  as 

P;!(X)  X5  +  2X:4  -  26X3  +  15X2  +  65X  -  78 ;X=x-l 

Horner's  method  has  become  less  popular  in 


recent  years  as  a  practical  numerical  method  of  finding  the 
real  roots  of  a  real  algebraic  polynomial  equation.  Gill 
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(1958),  however,  has  constructed  a  very  simple  computer 
program  which  gives  the  real  roots  in  binary  form.  The 
only  arithmetic  operations  involved  are  addition,  subtraction, 
and  dividing  by  two.  Convergence  to  a  real  root  is  assured 
although  at  a  linear  rate. 

i 

§  2. 31. 10  Euclid s  Algorithm  for  Two  Polynomials  in  One 
Variable 

Let  f(x)  and  g(x)  be  the  two  polynomials 
f(x)  =  aQxn  +  ....  +  an_1x  +  an, 

g(x)  =  bQxm  +  ...  +  b  ^x  +  where  n>m>0 . 

By  the  greatest  common  divisor  of  these  two 
polynomials  is  meant  their  common  factor  of  greatest  degree . 

Let  us  apply  Euclid's  algorithm  to  f(x)  and  g(x)  precisely 
as  we  did  to  determine  the  remainder  when  a  polynomial  is 
divided  by  a  linear  factor.  We  thus  get  the  system  of 
identities 

f  (x)  =  Q0(x)  g(x)  4-  R^x), 

q(x)  =  Qx(x)  R1(x)  +  R2(x), 

Vi(x)  =  Vx)  rp(x)  +  rp+i- 

Then  q(x),  R1(x),  ...  are  polynomials  of  decreasing  degrees, 

so  that  after  a  finite  number  of  steps  a  remainder  is  reached 
which  is  a  constant  here  indicated  as  r  Clearly,  the 

Jr  i  -L 

necessary  and  sufficient  condition  that  f(x)  and  g(x)  be 
relatively  prime  is  that  R  ^/0.  If  Rp+p.=0  then  R^(x)  is  the 
greatest  common  divisor  of  f(x)  and  g(x),  Bocher  (1922). 
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§  2.31-11  Dialytlc  Elimination  for  Two  Polynomials  In  One 


Variable 


Let  f(x)  and  g(x)  be  the  two  polynomials 

f (x) =  aQxn  +  ....  +  an, 

g(x) =  b  xm  +  . . . .  +  b  .  where  n>m>0. 

'  o  nr  — 

We  multiply  the  first  equation  by  the  successive 

powers  of  x. 


m-1  m-2 

x  ,  x  ,  .  .  . , 

and  the  second  equation  by 

n-1  n-2 


2 


x  ,  x  ,  .  .  . ,  x  ,  x 

thus  obtaining  n+m  linear  equations ,  homogeneous  in 
xn+m-l^  ^n+m-2^  ^  i_  #  Prom  the  theory  of  linear  equa¬ 

tions  we  have  that  the  necessary  and  sufficient  conditions 
for  the  two  polynomials  in  one  variable  to  be  relatively 
prime  is  that  their  resultant 


,ai 

.b 


n> 


m 


a  .  .  .  . . 

. a  0  .  . 

0 

o 

n 

0  a^  .  . . 

.  .  . . . a 

0  .  .  .  . 

.  0 

0 

n 

0 . 

o 

n 

0  - -  • 

b  •  •  •  ■ 

0 

m 

b  .  .  .  . 

.  b  0 

.  0 

o’* 

m 

does  not  vanish. 

We  define  the  i-th  subresultant  of  the  two 
polynomials  in  one  variable  as  the  determinant  obtained  by 
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striking  out  the  first  and  last  i  rows  and  columns  from 
the  resultant  of  these  two  polynomials.  The  degree  of  the 
greatest  common  divisor  of  f(x)  and  g(x)  is  then  equal  to 
the  subscript  of  the  first  of  the  subresultants  Ro=Rj(R^, 
R0,  . . .  which  does  not  vanish. 

A  more  efficient  representation  for  the  result¬ 


ant 


R  (b°,,,M,n)  ,  where  n-m=r, 
oJ  ' '  *  *  m 

is  possible  which  requires  only  an  nxn  determinant  whose 

elements  are  the  minors  of  matrix: 

a  a,  ...  a  a  .  -i  ...  a 
r  o  1  r  rfl  n  -> 

L . b  bT  .  .  .b  J 

o  1  m 

The  case  for  n=3j  m=2  is  given  as 


an->  9--i  >  9-p  j  ap 

R  A  b  t  >  = 

DoJ  Dl* °2 


o  o 


a  bn 
o  1 


aot,2 


aobX  I  (I  aob2  1  +^lbl  0  I  a]_b2 


aob2 


alb2 


agbg 


A  more  extensive  treatment  of  this  subject  may  be  found  in 
Bocher  (1922)  and  Turnbull  (1944)  . 


§  2.32  Upper  Bounds  to  the  Real  Roots  of  a  Polynomial 


In  order  to  determine  the  real  roots  of  a  numerical 
equation  it  is  advantageous  to  narrow  the  region  within 
which  they  must  be  sought.  In  this  section  we  shall  concern 
ourselves  with  the  problem  of  determining  the  upper  bound 
to  the  positive  roots  of  P  (x);  the  lower  bound  on  the 
negative  roots  can  be  founQ  in  a  similar  manner. 
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§  2.32.01  Lemma 

An  upper  bound  to  the  positive  roots  of 

t->/\  n  n— 1,  ~ 

Pn(x)  =  x  +  a-j^x  +  ...  +  an  =  0 
is  given  by  _ 

tyKI  +  ^ 

n — r1 

where  apx  is  the  first  negative  term  and  a^  is  the 
largest  negative  coefficient. 

Proof:  Let  x  be  such  that 


n  ^  i 

x  >  a 


k 


n-r  ,  n-r-1 ,  ,  ,  n v  .  » 

x  +  x  +...+x  +  l)>a 


k 


n-r+1 , 

x _ — 1 

x-1 


and  hence  Pn(x  )  >  0. 


Taking  x  >1 

n 

x 

n+1 

x 

r-1 

x 


,  the  above  inequality  is  satisfied  by 

>|aklxn’r+1/x“1'  or 

n  .  i  I  n-r+1 
-x  >jakjx  ,  or 

(x-1)  >  jal  which  inequality  is  again 


satisfied  by 

(x-l)r_1(x-l)  >  |akj, 
r-1  /  \ r-1 

since  x  >  (x-1)  .  We  have,  therefore. 


(x-l)r  >|ak|,  or  X  >  1  +  r/^ 


§  2.32.02  Lemma 

If  in  any  polynomial  each  negative  coefficient 
is  taken  positively,  and  divided  by  the  sum  of  all  the 
positive  coefficients  which  precede  it,  and  if  q  is  the 
greatest  of  these  quotients,  then  q  +  1  is  an  upper  bound  of 
the  positive  roots. 


§  2.33  Approximate  Determination,  of  Roots 


The  numerical  solution  of  an  arbitrary  high  degree 
polynomial  is  best  affected  in  several  steps.  The  first 
step  consists  of  the  approximate  determination  of  the 
moduli  of  some  or  all  of  the  roots  by  e.g.  the  root-squaring 
process.  Higher  accuracy  if  required  can  be  achieved  by  an 
iterative  procedure  such  as  Newton's  rule,  for  real  zeros, 
and  Bairstow's  formula,  for  complex  zeros.  In  this  section 
we  review  the  graphical,  Bernoulli-Aitken,  and  Graeffe 
root-squaring  methods  for  the  approximate  determination  of 
roots.  A  more  extensive  account  of  these  methods  is 
available  in  Olver  (1951) . 

An  account  of  the  numerical  solution  of  algebraic 
equations  using  an  electronic  computer  for  Bernoulli's 
method,  the  root-squaring  method,  and  the  Newton-Raphson 
method  is  available  in  Brooker  (1952) .  A  more  recent 
approach  to  the  algebraic  problem  known  as  "direct  search" 
is  given  in  Ward  (1957)  and  Lance  (1959). 

§  2.33.01  The  Graphical  Method 

The  problem  of  determining  the  roots  of  Pn(x)  =  0 

can  be  reduced  to  the  problem  of  graphing  P  (X)  in  certain 

n 

narrow  regions.  Information  can  be  gained  concerning 

complex  roots  by  observing  that  a  pair  of  complex  roots  exist 

at  x  =  x  if 
o 

(i)  p'(*0)  =  0,  P^(X0)  >  o,  Pn(x0)  >  0, 

or  (ii)  py(x  )  =  0,  P/7(x  )  <  0,  P  (x  )  <  0. 

'  '  n  o'  n  '  o'  nv  oy 

The  Graphical  Method  of  solving  for  the  roots  of  a  poly¬ 
nomial  can  be  readily  extended  to  non-algebraic  functions 
and  also  to  the  case  of  two  polynomial  equations  in  two 
unknowns . 

§  2.33.02  Graeffe 's  Root-squaring  Process 


The  root-squaring  process  is  essentially  a  means 
of  calculating  the  moduli  of  the  roots.  The  basis  of  this 


method  Is  that  if  the  polynomial 

/  \  n  ,  n-1 

P„  (x)  =  ax  +  a-,x  +  .  .  . 
n  o  1 

has  n  simple  zeros  arranged  as  |  x^^  |  > 


+  cl 


rr 


x2  I  - 


>  I  x. 


2  2 

then  a  polynomial  whose  zeros  are  -x-^  ,  -x^  , 


-x 


n 

2 


n 


is  given  by 


P „(x)  =  b  xn  +  bn  x 


n 

t 

where  b  =  a 
s  s 


n-1 


o 


+  ...  +  V 


2as-l  as+l  +  2as-2  as+2 


( s=0 , 1,  .  . .  ,n 


The  application  of  this  transformation  m  times  in  succes- 

M  M 

sion  yields  a  polynomial  whose  zeros  are  -x-^  ,  -x^  ,  .  .  .  , 

-x  where  M  -  2m.  For  sufficiently  large  m,  the  trans¬ 
formed  polynomial  breaks  up  and  the  moduli  of  the  zeros  can 
be  computed  from  the  ratios  of  the  arithmetical  M-th  roots 
of  adjacent  coefficients  (see,  e.g.,  Whittaker  &  Robinson, 
1944,  §  §  54  to  58) . 


A  major  difficulty  arising  in  hand  computing 
applications  of  the  root-squaring  process  is  the  checking 
for  errors  in  each  transformation;  undetected  mistakes  can 
make  subsequent  computations  useless.  The  only  suitable 
check  available  is  the  unsatisfactory  method  of  ordinary 
duplication.  The  computational  arrangement  of  the  calcula¬ 
tions  is  very  important  and  the  technique  of  handling  the 
coefficients  in  the  form  N  =  p  xlOM —  where  p  lies  between 
1  and  10  and  q  is  an  integer  —  is  recommended.  Important 
difficulties  which  often  occur  with  the  root-squaring  of 
high- degree  polynomials  are 

(i)  the  failure  of  the  polynomial  to  break  up 


' 

:■  ■  .0.  ;  :  ■  'T 
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after  a  resonable  number  of  transformation, 

(il)  the  severe  loss  of  significant  figures 
because  of  cancellation. 

§  2.33.03  Aitken-Bernoulll  Method 

Prom  the  standpoint  of  speed  and  certainty  of 
success  the  Aitken-Bernoulli  process  is  inferior  to 
Graeffe's  root-squaring  method.  This  method  is  useful, 
however,  for  evaluating  one  or  two  extreme  zeros  and  it  may 
be  used  on  automatic  computing  equipment.  A  detailed 
account  of  this  method  is  given  by  Aitken  (1926) . 

The  basis  of  Bernoulli's  method  is  that  if  the 

polynomial 

Pn(x):f(x)  =  aQxn  +  a1xn_1  +  ...  +  a^x  +  aR 

has  n  simple  zeros  x-^,  x^,  .  ..,  xn  (assumed  arranged  in 
the  form  I  xn  I  >  I  x0  I  >  . . .  >  I  x  I  )  then  the  difference 
equation 

a0fi(t+n)  +  ...  +  an_1f1(t+l)  +  anf-^(t)  -  0  (1) 

has  the  general  solution 

\j  ”fc 

f1(t)  =  w1x1  +  w2x2  +  ...  +  wRxn  , 

where  w^,  w 2,  ...,  wn  are  arbitrary  constants.  If  |  x^  |  >  [ 

x2  |  then  clearly 

f1(t)  'v  w1x1t  and  f^  (t+l)/f^  (t)->  x1,  as  t-*oo.  (2) 

From  a  sequence  of  numerical  values  of  f^(t)  constructed 
by  using  (l)  as  a  recurrence  relation,  the  value  of  x^  can 


<  <  < 


< 
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usually  be  obtained  to  any  given  degree  of  accuracy  from 
(2),  provided  the  sequence  is  taken  far  enough.  The 
factor  x-x^  can  be  removed  from  P  (x)  and  the  process 
repeated  to  obtain  x0 ,  provided  |  x0  |  >  |  x^  |.  Further 
applications  enable  P  (x)  to  be  solved  completely,,  provided 
its  zeros  are  all  real  and  distinct  . 

Aitken's  generalization  is  designed  to 
determine  all  the  zeros  from  the  sequence  jf^(t)|  and  to 
include  the  cases  of  complex  and  multiple  zeros.  He 
constructs  further  sequences  <jf  (t)>  defined  by 


f2(t)  = 


fx(t)  f^t+l) 
f^t-l)  f1(t) 


f 


s+1 


(t)  - 


f  (t)  f  (t+lj 

f s ( t_1 )  fs(+) 


1  f 


s-1 


(t) 


(s  >  2),  and  proves  that  if 
zs(t)  E  fs(t+1)  /  fs(t)  and  |  xg  |  >  |  xs+1  |  ,  then 

zg(t)->xix2...xs,  as  t->oo.  (3) 

This  result  allows  us  to  evaluate  numerically  the  real  zeros 
and  moduli  of  the  complex  zeros.  We  consider  the  case  of 

Xf,  x2  real  and  distinct,  x^,  x^  a  pair  of  complex  conjugate; 

+i-0 

"  ^  ~ '  -  --  -  Aitken  proves  that 


re- 


and  that  |  x2  |  >  r  >  |  x^  | 

Z^(t)  o.  x^x2  r  cos  j  ( t+1)  -©  +  a  |  sec  ( t-0+a) , 

where  a  is  an  arbitrary  constant,  and  hence  that 
jz3(t+l)  Z3(t)  +  k2  |  /Z3(t)->  2k  cos  -0, 

where  k^  x^x2r  and  is  known  by  a  previous  application  of  (3). 

A  serious  disadvantage  of  Aitken's  process  is 
the  severe  cancellation  that  occurs  in  the  numerical  calcula¬ 
tion  of  the  sequence  < f  (t)  V  .  In  fact,  it  can  be  shown 
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that  if  R  leading  figures  of  Z_(t)  and  Z  (t-1)  are  identi- 
cal,  then  R  figures  are  lost  'off  the  front1  in  forming 
fg+-^(t).  Since  R  increases  with  t,  this  cancellation  be¬ 
comes  more  and  more  severe  as  the  work  progresses  and  hence 
a  large  number  of  significant  figures  must  be  retained  in 
the  early  sequences  of  the  process. 


§  2.33.04  Whittaker's  Theorem 

The  root  of  the  equation 

P  (x)  =  a  x11  +  .  .  .  +  a  =0 
n v  '  o  n 

which  is  smallest  in  absolute  value,  is  given  by  the  series 


-a 


n 


2 

a  a  0 
n  n-2 


n-1 


a  i 
n-1 

a 

n- 

a 

n 

n-1 


a 


3 


n 


n 


n-2 

a  0 
n-3 

n-1 

a  0 
n-2 

an-2 

a  1 
n-1 

an-2 

an-l 

a 

n 

an-l 

0 

an 

n-1 


provided  that  the  series  converges.  There  is  no  indication 
in  the  literature  that  this  method  has  been  successfully 
used  in  large-scale  examples.  Experience  would  suggest  that 
methods  based  on  the  evaluation  of  determinants  of  large 
order  is  laborious  and  inefficient.  A  proof  of  Whittaker's 
Theorem  is  given  in  Whittaker  and  Robinson  (1944)  and  in 
Turnbull  (1944). 

§  2.34  Improving  the  Accuracy  of  Roots 


Given  some  approximation  to  a  zero,  an  iterative 
process  if  successful  must  produce  a  better  approximation. 
Except  in  special  cases  (such  as  the  well-known  procedure 
for  evaluating  a  square  root),  convergence  is  not  guaranteed. 
Both  the  possibility  and  the  speed  of  convergence  depend  in 
general  upon  the  accuracy  of  the  first  approximation. 


✓ 


■ 
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Iterative  methods  are  therefore  most  useful  when  a  fair 
approximation  is  already  known.  Iteration  should  be 
regarded  as  a  supplementary  process  to  some  direct  method 
such  as  root- squaring. 

A  measure  of  the  power  of  an  iterative  process 
can  be  taken  as  the  degree  or  order  of  convergence  which 
is  defined  as  follows: 

If  a  is  a  simple  or  multiple  zero  of  f(x),  a 
is  an  approximation  to  a  and  r]—  a-a,  then  an  iteration 
formula  of  the  form 

t|  =  F(a)  +  0(  |n|  k)  (k  >  1  ) 

is  said  to  be  convergent  to  the  k  th  degree.  We  note  at 
this  point  that  we  need  not  restrict  this  process  to 
algebraic  problems,,  Olver  (1952). 


§  2. 34.01  Method  of  False  Position 

If  f(x)  has  opposite  signs  at  x  =  x^  and  x  = 
then  at  least  one  root  lies  between  x-^  and  .  Regarding 
x^  and  x0  as  approximations  to  the  root,  we  can  obtain  a 
new  approximation  x^>  by  inverse  linear  interpolation,  . . 
according  to  the  formula 


xxf (x2) 
f  U2) 


x2f (xq) 

fUi) 


(x2-xi)  f(x2) 
f(x2)  -  f(xx) 


This  process  can  be  repeated  and  is  convergent 
if  the  function  is  continuous  in  [x-^,  x^]. 


§  2. 3^.02  Newton-Raphson  Method 

If  x^  is  an  approximation  to  a  zero  xQ  of  f(x) 
then  in  general  a  better  approximation  is  x^  +  6x1, 

6xl  =  -  f (xx)  /f/(  }  . 


where 


' 


' 


This  formula  can  be  shown  to  be  quadratically  convergent. 

Two  conditions  which  need  to  be  satisfied  in  order  that 
this  method  converge  to  a  real  root  are 

(i)  f7(x)  /  0  for  x  e[x1,xQ])  and 

(ii)  f/;(x)  /  0  for  x  e[x^x()]J 
Whittaker  and  Robinson  (1944) . 

§  2.34.03  Cubically. Convergent  Formulae 

A  simple  formula  is 

6x1  =  -  f(x1)  -  (f(x1))2  f//(x1) 

(xx)  2J?' (x1) )3 

Another  formula  is  that  of  Laguerre 
6x-^  =  -  nf(x1) 

f  ±(  ( n~  1 )  2  ( f  /(x1)  )2-n(n-l)f  (x1)f//(x1) ) 

where  n  is  the  degree  of  f(x).  This  method  amounts  to 
approximating  the  polynomial  by  parabolas  between  two  zeros ^ 

Todd  (1962) . 

§  2.34.04  Bairstow3s  Method 

Let  the  results  of  dividing  Pn(x)  twice  in 
2 

succession  by  x  -px-$  be  denoted  by 

Pn(x)  =  (x2-p x-Jl)  Pn_2(x)  +  qxx  +  (qQ  -  pq1), 

Pn_2(x)  =  (x2-Px‘^)  Pn_i|(x)  +  Tlx  +  (To  ~  pTl^*  (1) 

Then  Bairstow3s  formula  is  given  by 

D6p  =  T1qQ  -  TQq1,  Vdt  =  Mc^-  TQqQ, 


•  ovT 


' 


■ 

% 


(2) 


where  M  =  +  pTQ  ,  D  =  TQ^-  MT^  . 

Simple  checks  on  the  computation  are  given  by 

T16i  +  TQ5p  =  -q1  ,  TQ6,0  +  M6p  =  -qQ. 

It  can  be  shown  that  this  method  is  quadratically  convergent,, 
M.C.M.  (1961). 

§  2.39  Evaluation  of  Zeros  of  Ill-conditioned  Polynomials 

Let  a  be  a  simple  zero  of  the  general  polynomial 
P  (x)  given  as 

Pn(x)  =  xn  +  a1xn_1  +  ...  +  an,  i.e.  aQ  =  1. 

Then  if  a  is  increased  by  6a  ,  the  value  of  P  (a) 
n-m  J  n-m  nv  ' 

changes  from  zero  to  a,n_m  6a  ,  to  the  first  order  of 

n-m 

approximation.  Hence,  by  Newton’s  rule,  the  corresponding 

change  in  the  zero  is 

.  n-m, 

6a=-a  5an_m/pn/  (a)_ 

For  the  case  of  a  double  root  we  obtain 

(5a)2  =  -  2  &  an_m  an‘m/p''(a). 

The  direct  use  of  these  formulae  enable  us  to  determine 
quite  simply  the  limitations  of  accuracy  that  rounding  or 
observational  errors  in  the  coefficients  impose  upon  each 
real  simple  or  double  zero,  Wilkinson  (1959)  and  Olver  (1951). 

An  example  given  in  Wilkinson  (1959)  is  reviewed 
here.  The  polynomial  is  of  degree  20  and  is  defined  by 

P20(x)  =  (x+1) (x+2) . . . (x+20) . 


. 


...  .  1  .  r  .q TT fc-;  f  . 
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Clearly , 


and 


p'0W 


n 


a. 


6a20 


~~n-m , 
•20  6a. 


n-m 


T97 


This  takes  its  greatest  value  for  n-m  =  1 9 ^ 
namely 

6a20  =  6al  £  0.43xl086a1. 

The  20th  root  is  not  the  most  sensitive  to  variations  in 
a-^  however.  We  have  in  fact 

6 Or-  =  1519  6a-,  ~  0.21xl0106a,  ,  and 

5  5ITT.'  1  1 

6ai,  =  l619  6a,  £  0 . 24xl01°6a1  . 

4  TTT5.'  1  1 


We  observe  that  the  multiplication  factors  are  so  large 

_7 

that  for  a  change  of  order  10  in  6a^j  the  linear  approxima¬ 
tion 

6a  =  -  an‘m  6an_m/P^  (a) 


is  completely  invalid;  e.g.  a  pair  of  complex  roots  of  the 
new  polynomial 

P2Q(x)  =  (x+l)'(x+2)  ....  (x+20)  +  2_28x19  are 


-16.730737466  +  2. 8126248941 
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§  2.4  Polynomials  in  Several  Variables 


Our  review  of  the  single-variable  algebraic 
problem  is  now  extended  to  the  nonlinear  algebraic  problem 
in  several  variables.  We  concern  ourselves  with  the  case 
of  n  real  variables  x,,  x?,  x  ,  denoted  by  x:nxl. 

A  more  extensive  treatment  of  thisnproblem  is  given  in 
Bocher  (1922). 

§  2.41  Theorems,  Definitions,  and  Algorithms 


§  2.41.01  Definition 

By  a  factor  or  divisor  of  a  polynomial  f(x)  is 
understood  a  polynomial  g(x)  which  satisfies  an  identity  of 
the  form 

f(x)  =  g(x)  h (x)  ;  x  :  nxl  , 
h(x)  being  also  a  polynomial. 

§  2.41.02  Definition 

A  real  polynomial  is  said  to  be  reducible  in  the 
domain  of  reals  if  it  is  identically  equal  to  the  product 
of  two  real  polynomials  neither  of  which  is  a  constant. 

§  2,41.03  Definition 

Two  polynomials  are  said  to  be  relatively  prime 

if  they  have  no  common  factor  other  than  a  constant. 

Let  f (x^x^)  be  any  polynomial  in  two  variables 
x-^j  x^  arranged  according  to  powers  of  x-^  , 

f(x1,x2)  =  aQ(x2)x1n  +  ...  +  aR(x2)  =  0  , 

the  aJs  being  polynomials  in  x2 . 


i  1  Mj  si'TiO'  r^Loq  rrirxf  '  \e  orf: 
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§  2.  41.04  Theorem 


A  necessary  and  sufficient  condition  that  a 
polynomial  in  x0  alone,  g(x0),  be  a  factor  of  f(x-^,x0)  is 
that  it  be  a  factor  of  all  the  a's. 

§  2.41.05  Theorem 

If  two  polynomials  f(x1,x2)  and  g(x2,x2)  are 
relatively  prime,  there  is  only  a  finite  number  of  pairs 
of  values  of  (x-^,x2)  for  which  f  and  g  both  vanish. 

§  2.41.06  Remainder  Theorem  for  Polynomials  in  Two  Variables 

A  necessary  and  sufficient  condition  that  f(x2,xp) 
and  g(x^,x2)  have  a  common  factor  which  involves  x^  is  that 
R(xp)=  0,  where  R(xp)  is  defined  by  Euclid's  algorithm  in 
§  2.41.08. 

§  2.41.07  Remainder  Theorem  for  Polynomials  in  Several 
Variables 

A  necessary  and  sufficient  condition  that  f(x) 
and  g(x);  x:nxl,  have  a  common  factor  which  involves  x^  is 
that  R(x2, . . . ,xn)  -  0. 

Theorems  §  2.41.04  and  §  2.41.05  can  also  be 
readily  extended  to  n  variables  simply  by  replacing  y  by 
the  n-1  variables  xp,...,x  .  A  more  extensive  treatment  on 
this  topic  is  given  in  BocRer  (1922). 

§  2.41.08  Euclid's  Algorithm  of  the  Greatest  Common  Divisor 

for  Polynomials  in  Several  Variables 

We  consider  the  special  case  for  polynomials  in 


two  variables.  Let  f(x1,xp)  and  g(x1,xp)  be  any  two 


•  .  ; . 


polynomials  in  x^  and  x0  arranged  according  to  powers  of  x, 
f(x1,x2)  =  aQ(x2)  x-Ln  +  ...  +  an(x2)  ,  aQ{x2)fO, 
g(x1,x2)  =  bQ(x2)  X]IU  +  +  bm ( x2 )  i  bQ^0,n>m>0. 

Dividing  f.(x-^x,J  by  g(x-^,x0)  wc  obtain  the  identity 

p0(x2)  f(xi^x'2)  E  .Q0(x1,x2)  g(x1,x2)  +  R1(x1,x2). 

We  now  define  the  computing  algorithm. 


R(xx,x2)  <- 

P(x2)f (x-^Xg)  -  Q(x1Jx2)g(x1Jx2) 

R(x1,x2) : 

R(x2)  ,  END 

f(x]_,x2)  - 

S ( X1 ^ x2) 

g(x1,x2)  <- 

R(x1,x2) 

-> 

1 

END 

§  2.42  Sylvester^  Dialytic  Method  of  Elimination 

From  n  separate  equations  it  is  generally  possible 
to  eliminate  n-1  unknowns.  Sylvester's  dialytic  method 
provides  a  systematic  means  of  eliminating  one  unknown  from 
two  equations.  From  a  reverse  point  of  view  the  vanishing 
of  the  resultant  may  be  considered  as  the  condition  under 
which  the  two  equations  in  question  possess  a  common  root. 

We  consider  the  case  of  two  equations  in  two 
unknowns;  i.e.  let  f(x^,x2)  and  g(x1,x2)  be  any  two  polynomial 
in  x-^  and  x 2  arranged  according  to  powers  in  x-^, 

f(x1,x2)  =  aQ(x2)  X1R  +  •••  +  an(x2^ 

g(x1,x2)  =  bQ(x2)  xim  +  •••  +  bm(x2)>  where  n>m>0 


. 


.  . .  -  . . .  . -  ■  —  — . . . — - - J . . . •••*•'  . - -  .•••-. 


We  multiply  the  first  equation  by  the  successive 


powers  of  x^ , 


and  the  second  by 


x- 


m-1 


m-2 
1  ' 


‘1  J 


‘  n-1  n-2 

x1  ,  X^  ,  .  .  .  ,  X- 


thus  obtaining  n  +  m  linear  equations,  homogeneous  in 
n+m- 1 


x. 


,  .  .  . ^  Xx,  1 


The  condition  for  their  consistency  is, 


by  the  theory  of  linear  equations. 


R  - 


ao 

(x2) . 

•  *  an ( x2 )  0 . 

.  .  .0 

0 

a0U2) 

. aTx2> 

0.  .0 

0 

b0(x2) 

. Vx2> 

0.  .0 

bo 

(x2) . 

••VX2)  0 . 

.  .  .0 

= 

0  can  also 

be  written  as  P( 

x2)  = 

=  0 


The  roots  of  this  polynomial  are  possible  solutions  for  x2 
of  the  simultaneous  system.  The  values  of  x-^  can  then  be 
found  by  substitution. 

Prom  three  equations,  consistent  in  two  unknowns 
x-^  and  x^,  we  could  eliminate  x-^  from  the  first  and  second,  and 
again  from  the  first  and  third  equation,  giving  two  resultants 
containing  x^,  from  which  x^  could  be  eliminated  by  a  like 
procedure.  We  note,  however,  that  the  total  degree  of  the 
resultant  polynomial  can  become  very  large. 

§  2.42.01  Example  of  Sylvester's  Dialytic  Method 

Find  the  simultaneous  solution  for 

2  2 

x  +  y  -  y  =  1 
=  1  . 


x  y 
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using  the  Dialytic  method  we  obtain 


1  0 

(  0  y 

y  -i 

For  consistency  we  must  have 

1  0 
P(y)  =  0  y 

y  -i 


C.  T 

y  -y  -i 


X 


-1  )  (  X  )  =  0 


0 


p 

y  -y-i 


-l 

o 


i .  e 


-1  +  (y2-y  -1) (-y2)  =  0  ,  or 


=  0  , 


Pj,(y)  =  y4  -  y3  -  y2  +  i  =  o 


4 

This  fourth  degree  polynomial  has  at  most  two  real  roots  by 
Descartes  Rule  of  Signs. 


§2.5  Curve  Tracing  of  Polynomials  in  Two  Variables 


The  graph  of  a  function  may  in  some  cases  be  an 
important  aid  in  the  solution  of  an  algebraic  problem. 

Although  the  graph  cannot  be  said  to  represent  a  function 
accurately,,  nevertheless,  it  is  a  useful  technique  for  exhibit¬ 
ing  both  the  nature  and  approximate  value  of  a  solution.  The 
special  techniques  available  for  graphing  a  function  in  two 
variables  are  not  well-known  and  hence  the  following  review 
is  included  in  this  thesis.  Specifically,  we  will  discuss  the 
properties  and  use  of  the  analytic  triangle. 

§2.51  De  Gua  Triangle 


We  construct  the  following  diagram  and  label  it 


as  shown. 


. 


- 
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An  equation  is  said  to  be  placed  on  the  triangle 
if  a  cross,  or  dot  is  placed  on  the  center  of  the  rectangle 
corresponding  to  a  term  of  the  equation.  In  the  diagram  shown 
we  have  placed  the  equation 

f^(x,y)  =  ay  [  +  bx^y  +  cx~V  +  dxy"  +  exL  +  fxy  +  gx2y2  . 

We  then  proceed  to  form  a  convex  polygon  exterior  to  which 
no  point  lies. 

Another  triangle  scheme  employed  is  as  follows, 
y  x 


We  place  the  same  equation  on  this  analytic 
triangle  as  for  the  De  Gua  triangle.  The  analytic  triangle 
is  more  useful  since  fractional  indices  -  hence  the  name 
analytic  -  can  be  placed  on  the  triangle  more  easily/ 

§  2.32  Properties  of  the  Analytic  Triangle 

When  all  the  terms  of  an  equation  are  placed  on 
the  triangle,  the  following  properties  hold,  with  respect 
to  any  straight  line  which  contains  two  or  more  circles. 

(1)  If  every  term  not  on  a  straight  line  is 
rejected  the  resulting  equation  gives  one  or  more  constant 


•  '  "  L‘ 


-  'V  -r  i.  ;  )T:s ;  f^tfa  v;ns  ocf 
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S  3? 

values  of  the  ratio  :  x  ,  where  the  values  of  r  and  s 
depend  only  on  the  direction  of  the  line. 

(2)  If  the  extended  straight  line  intersects  both 
sides  of  the  triangle  or  the  extended  triangle  with  the 
relation  y  :  x  the  terms  of  the  original  equation  on  the 
same  side  of  the  line  as  the  right  angle  will  be  less  and 
the  terms  on  the  other  side  of  the  line  will  be  greater 

than  any  of  the  terms  on  the  line  for  x  and  y  very  large; 
the  reverse  holds  for  x  and  y  very  small  if  there  is  no 
constant  term. 

(3)  When  the  line  makes  acute  angles  with  both 

S  2? 

sides  of  the  triangle,  the  relation  is  of  the  form  y  x 
constant;  when  y  is  very  large  and  x  very  small  the  terms 
on  the  y  side  are  greater,  those  on  the  x  side  smaller  than 
any  term  on  the  line  and  vice  versa  for  x  great  and  y  small. 

(4)  When  the  line  is  parallel  to  one  of  the  sides, 
say  to  Ox,  the  resulting  equation  gives  one  or  more  straight 
lines  which  are  parallel  to  Oy;  and  when  y  is  very  large  each 
term  on  the  side  o.f  0  is  less  and  those  terms  on  the  other 
side  are  greater  than  the  terms  on  the  line. 

(5)  When  the  line  coincides  with  a  side  of  the 
triangle  the  resulting  equation  defines  the  points  of  inter¬ 
section  with  the  corresponding  axis . 

Furthermore  if  any  line  gives  a  first  approximation  to  an 
asymptote,  the  terms  to  be  taken  into  account  for  the  second 
approximation  are  found  by  moving  the  line  parallel  to 
itself  until  it  passes  through  another  dot  or  group  of  dots, 
all  of  which  correspond  to  the  terms  required  to  be  taken 


■ 

. 

. 


■. ^  i  T.  v.t;ja  w  c-rKt  (xO  oj  \'3a 
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into  account. 

§  2.53  Examples  of  the  Method 
§  2.33.01  Example  I 


or  y  =  +  b  x  ,  the  usual  expression  for  the  asymptotic 
a 

lines  of  an  hyperbola. 

a  -.2  2  2,  2  n 

P7  :  b  x  -ab  =  0  , 

or  x  =  +  a  ,  the  intercepts  for  the  hyperbola. 

2  2  2  2 

ya:  -ay  -  ab  =  0 


0 


o  ;  o  •• 

f  «  . "  ‘10 
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y 


X 


.6 

6 


23  33  23  32 

7^:  2a  x^y  -  by,  or  2a  xJ  -  bJy  =  0 


_  x 


x 


x 


§  2.6  Systems  of  Linear  Equations 


An  important  technique  in  solving  systems  of 
nonlinear  equations  is  the  method  of  iteration.  The  merit 
of  the  iterative  or  approximate  schemes  is  that  they  em¬ 
ploy  simple  and  uniform  operations  and  hence  are  easily 
mechanized.  Before  we  proceed  to  the  application  of 
iterative  methods  to  nonlinear  systems  we  find  it  conven¬ 
ient  to  consider  first  the  theoretical  and  computing 
aspects  of  linear  systems.  In  this  way  we  encounter  in 
a  simple  context  most  of  the  matrix  properties  we  require. 


The  problem  of  solving  a  non-homogeneous  linear 
algebraic  system  is  closely  related  to  the  problem  of  invert¬ 
ing  a  matrix  and  the  problem  known  as  elimination.  Numerical 
methods  of  solving  the  stated  problem  are  divisable  into  two 
groups:  exact  and  Iterative  methods.  The  fundamental  method 
which  is  referred  to  as  an  exact  method  is  the  Gauss  method 
of  elimination.  The  Gauss  method  is  reviewed  in  this  thesis 
although  it  has  been  superseded  by  the  more  compact  and 
efficient  schemes  such  as  the  square-root  and  L-U  methods. 
Iterative  methods  which  in  general  require  only  simple, 
uniform  operations  and  hence  are  easily  mechanized  afford  a 
means  by  which  a  system  of  linear  equations  may  be  solved 
approximately.  Before  we  proceed  with  methods  of  solution  we 
review  the  more  basic  definitions,  notation,  and  theorems  of 
linear  algebra.  The  elementary  accounts  can  be  found  in 
Fadeeva  (1959)  and  Hildebrand  (1952) . 

Further  references  on  the  topic  of  linear  algebra 
include  a  treatment  at  an  intermediate  level  by  Fox,  Husky, 
and  Wilkinson  (1948);  a  study  on  relaxation  methods  by  Temple 
(1938);  and  a  comprehensive  work  with  an  extensive  biblio¬ 
graphy  by  Forsythe  (1953) .  A  more  recent  account  of  iterative, 
accelerated  iterative,  gradient,  and  conjugate  gradient 
methods  is  given  in  Hestenes  and  Stiefel  (1952)  and  Martin  and 
Tee  (1961) . 

§  2.61  Theorems,  Definitions,  and  Notation 


§  2.61.01  Matrix 


A  matrix  is  defined  as  an  array  of  numbers, 
complex  in  the  general  case,  which  can  be  written  in  the  form: 

a- 

AE  Uij]  E 


'11 

a12  * •  •  ' 

• ‘ ’ *aln 

'21 

9*22  *  0  *  * 

.... a2n 

'ml 

m2 

'  '  ' *  mn 

consists  of  m  rows  and  n  columns  of  elements,  where  in  the 

typical  element  a.  .  ,  the  first  subscript  (here  i)  denotes 

a  J 

the  row  and  the  second  subscript  (here  j)  the  column  occupied 
by  the  element. 


•  ■  :  -  -  ■  \’r 

■ 
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§  2.61.02  Trace  of  a  Matrix 

The  quantity  an ,  +  a00  +  . . .  +  a  is  called  the 
J  11  22  nn 

trace  of  the  matrix  A  and  is  denoted  by  Trace  A. 


§  2.61.03  Transpose  of  a  Matrix 

T 

The  transpose  of  a  matrix  A  denoted  by  A  is 
obtained  by  interchanging  the  rows  and  columns  of  A^  i.e. 

A  =  taij]  =  ^ajJ 

§  2,61.04  Inverse  of  a  Matrix 

A  matrix  B  is  called  the  Inverse  of  a  matrix  A 
denoted  by  A-1  if 

AB  =  BA  =  I  j  or  AA-^  =  A-"^A=I  where  I  is  the 


identity  matrix 


1  =  1 j  1  • 


§  2.61.05  Adjoint  of  a  Matrix 


A  matrix  B  is  the  adjoint  matrix  of  A  if 

B  ”  CV 

where  A.,  is  the  signed  determinant  obtained  by  striking  out 
J  u 

the  i  th  row  and  j  th  column  of  A  and  multiplying  by  (-l)1+J 
The  adjoint  matrix  B  satisfies  the  equation 

AB  =  |  A  |  I 

and  is  related  to  the  inverse  matrix  A  ^  by 

A-^  =  B/  I  A  |  ,  where  |  A  |  ^  0. 

§  2 . 61.06  Matrix  Product 


The  multiplication  of  two  matrices  A  and  B  is 


7,'v  c t.nl  ■  >  7  •"{  j  '  :■  v..:' 


.  ■■ :  f:  -  ■  .:rti  '  . :  3  j  ,  ,  :  .'W 
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denoted  by 


C  =  AB 


where  C  is  a  matrix  whose  elements  c  are  given  by 


°ij  =  a^  b^j  ,  summed  over  l 


§  2.61.07  Matrix  Norms 


n 

1. 

1  1  a  I 

I  x  =  max  2  , 

1  I  i  R=1 

1  aik 

n 

2. 

1  1  a  I 

I  XT=  max  2 

11  k  1=1 

1  aik 

3. 

1  1  A  | 

1  HI  ^  1,  k=l 

2 

aik 

4. 

1  1  A  | 

|  jy  =  max  ( 

( Ax) T. 

1 

2 


1 

,T„  N  2 


1 

2 


T 


The  above  norms  will  be  referred  to  as  I,  II,  III,, 
IV,  respectively.  Norms  I,  II,,  III  are  applicable  to  either 
square  or  rectangular  matrices  but  norm  IV  is  applicable 
only  to  a  square  matrix. 

All  four  norms  satisfy  the  following  conditions: 

(1)  N(A)>0  if  A/0  and  N(A)=0  iff.  A  =  0; 

(2)  N(cA)  =  I  . c  I  N(A)  where  c  is  a  scalar; 

(3)  N ( A+B) <  N(A)+N(B)  ; 

(4)  N(AB)  <  N(A)  N(B)  ; 

(5)  matrix  norms  I,  II,  and  III  are  upper  bounds 
to  the  eigenvalues  of  any  matrix. 


§  2.61.08  Theorem  1 

A  necessary  and  sufficient  condition  that  Am->  0 
is  that  the  matrix  A  has  eigenvalues  A^  such  that  |  A^  |  <  |  , 


all  i . 


-  jo:  I<  • 
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Proof 

Let  A  be  brought  into  diagonal  form  i.e. 

A  =  Py\_P_1  where  6± ^  .  Then  Am  =  PA* 1'1  P_1 

where  =  A1in61  ^  . 

Hence  for  Am->  0  it  is  necessary  and  sufficient  that_A_r"  ->  0  , 
for  which  it  is  in  turn  necessary  and  sufficient  that  |  |  < 

all  i.  In  the  case  the  matrix  A  cannot  be  brought  into 
diagonal  form,  the  theorem  is  proved  either  with  the  aid  of 
considerations  of  continuity  or  by  passing  to  the  Jordan 
canonical  form. 

§  2.61.09  Theorem  2 

In  order  that  Am  ->  0  ,  it  is  sufficient  that  any 
one  of  the  norms  of  A  be  less  than  unity. 

Proof 

Now  |  |  Am  |  |  <  |  |  Am_1  |  |  |  |  A  |  |  <  .  .  .  <  |  |  A  |  [\ 

Therefore  |  |  Am  |  |->0  if  |  |  A  |  |  <  |  and  thus  Am->  0 

§  2.61.10  Theorem  3 

The  necessary  and  sufficient  conditions  that  the 

series  „ 

2  m 

I  +  A  +  A  +  .  .  .  +  A  +  .  .  . 

be  convergent  is  that  Am  ->  0  as  m  ->  00  . 

Proof 

Prom  a  previous  theorem  we  have  |  I-A  |  ^  0  and 
therefore  (i-A)-'*' 


exists . 


i.  it,  .  '  ■ 
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Consider 

(I  +  A  +  A2  +  ...  +  Ak  )  (i-A)  =  I  -  Ak+1. 

Postmultiplying  by  (i-A)"1  we  have 

I  +  A  +  A2  +  . . .  +  Ak  =  (I-A)"1  -  Ak+1  (I-A)"1 
and  therefore 

L.H.S.  -»  (I-A)'1  since  Ak+1  ->  0  . 

§  2.61.11  Theorem  4 

An  estimate  of  the  rate  of  convergence  of  the 

series  of  the  previous  theorem  for 

|  |  A  I  |  <  |  is  |  |  A  |  |R+1  . 

1  -  I  I  A  M 

Proof 

We  have 

(1-A)"1  - (I+A  +. . .+Ak)  =  Ak+1+  Ak+2+  ... 

hence  |  |  L.  H.  S.  |  |  <  |  |  A  |  |k+1  +.  .  .  =  |  |  A  |  |k+1  . 

1  -  |  |  A  |  | 

§  2.62  Definition  of  the  Problem  of  Linear  Algebra 

A  general  set  of  n  linear  simultaneous  algebraic 
equations  in  n  unknowns  x-^  .  .  .  ,  xn  can  be  formulated  as 

allxl  +  a12x2  +  . +  alnxn  =  bl 

a21xl  +  . +  a2nxn  =  b2 

nl  1  n2  2  nn  n  n 

which  can  be  written  more  compactly  in  matrix  notation  as 
Ax  =  b  ;  A  :  nxn  ;  x  ,  b  :  nxl  . 


■ 
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The  necessary  and  sufficient  condition  that 
Ax  =  b  have  a  unique  solution  is  that  |  A  |  f  0  . 

§  2.63  Direct  Methods 

§  2.63.01  Cramer's  Rule 


If  A  is  nonsingular  then  we  may  write  the 
solution  of  our  system  of  linear  equations  as 


x  =  A 


-1 


b  =  J3_  b 
I  A  I 


where  B  is  the  adjoint  of  A. 

Hence  Xl  =  +  Ag±b2  +  ■  .  .  +  Anpn 

I  A  I 

where  A,  .  are  the  cofactors  of  the  element  a,  .  in  the 
ki  ki 

determinant  of  the  matrix  A.  If  we  define 

A-,  .b-,  +  An . bn  +  ...  +  A  .b  =  I  A.  I 
li  1  2i  2  ni  n  1  1  1 

we  may  denote  our  solution  as 


x,  =  I  A. 


(Cramer's  Rule). 


A 


This  result  serves  mainly  as  an  existence  theorem;  it  has  no 
practical  value. 

§  2.63.02  Gauss  Method 

The  problem  of  obtaining  the  simultaneous  solu¬ 
tions  of  n  linear  equations  in  n  unknowns  i.e.  Ax  =  b 
may  be  reduced  to  a  problem  of  solving  n-1  linear  equations 
in  n-1  unknowns  by  the  simple  method  of  elimination.  By 
adding  a  suitable  multiple  of  the  first  equation  to  all  other 
equations  so  that  in  each  resulting  equation  the  coefficient 


>0  - 


of  x.^  Is  zero  we  shall  have  remaining  n-1  equations  in  n-1 
unknowns  x0,  x,,  .  ..,  x^.  We  assume  here  that  |  a |  (  |  > 

|  akl  |  k  =  2j  ...)  n  and  hence  the  multipliers  are  clearly 
-a^j/ajp  -a^/a-jj,  ...  which  never  exceed  unity.  The  first 

equation  is  known  as  the  pivotal  equation  which  is  left 
unchanged  and  temporarily  left  aside.  We  now  continue  with 
the  reduced  problem  in  the  same  manner  and  obtain  finally 
a  single  equation  in  the  unknown  x  .  We  now  assemble  our 
various  pivotal  equations  and  obtain 


°llxl  +  °12X2  + 


C22X2  + 


+  c-,  x  =  dn 
In  n  1 


+  c0  x 
2n  n 


=  d. 


c  i  n  x  ,  +c  -i  x  =  d 
n-l^n-1  n-1  n-l,n  n  n  1 

c  x  =  d  . 

nn  n  n 

The  process  of  obtaining  the  above  triangular  system  of 
equations  is  known  as  Gaussian  elimination  or  pivotal  con¬ 
densation. 

We  can  now  solve  for  xn  from  the  last  equation 
and  by  substituting  in  the  previous  equation  obtain  xn 
and  so  on.  Practical  considerations  of  error  control  may 
compel  us  to  vary  this  procedure  somewhat;  hut  essentially 
the  above  description  covers  all  the  variants  of  the  Gauss 


method. 


■  , 


. 

.  0‘1  ■■£  \  .  I  . :  • 

■'  \H.QO 


§  2.64  Iterative  Methods 


§  2.64.01  Simple  Iteration 

The  linear  problem  Ax  =  b  may  be  written  as 
(I  -L  -U)x  =  b  where  -L,  -U  are  the  lower  and  upper  parts, 
respectively,  of  A  which  has  been  normalized  so  that  the 
diagonal  components  of  A  are  unity. 

Now  x  =  ( L  +  U )  x  +  b 


=  Mx  +  b  where  M  =  L  +  U. 

Choosing  a  vector  x°  arbitrarily  we  construct  the  sequence 
of  vectors 

x1  =  Mx°  +  b 

x2  =  Mx1  +  b  =  M2x°  +  ( I+M) b 


x(k+1)  =  M(k+1)x°  +  (l+M+M2+.  .  . +M^k  1'))b  . 

If  M  has  eigenvalues  such  that  |  Ai  |  <  |  all  i 
we  have  by  Theorems  1  and  3  that 


( k+1 ) 
xv  ' 

Hence  x 

of  the  system 


(I  -  M)  Xb  =  A  1b. 
k  -1 

lim.  x  =  A  b  which  is  the  solution 
k->  oo 


Ax  =  b  . 

We  may  use  any  norm  N(A)  as  a  test  for  converg¬ 
ence  (theorem  2)  and  a  bound  on  the  rate  of  convergence  is 
given  by 

(N(A))k+1 


1  -  N (A) 


(theorem  4)  . 


.  .  .  •  •  ... 


:00  aW  %  ’ 'jf!  .  U:d-:  s  '-Oo  av  js  i  ■•rfO 


:  ..  -  JV 

d  ?.(  bnti  '  r  :  ?c  3  JSri  aw 
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§  2.64.02  Gauss  Seidel  Iteration 


The  linear  problem  Ax  =  b  may  be  written  in  the 
form  (l-L-U)x=b  where  I,  L,  U  are  as  before. 

Rewriting  the  above  as  (I  -  L)x  =  Ux  +  b  and 
iterating  as  follows: 

(I  -  L)xk+1  =  Uxk  +  b  . 

Premultiplying  by  (I  -  L)~^  we  obtain 

xk+i  =  (!  _  L)_1Uxk  +  (I  -  L)_1b 
assuming  (I  -  L)  is  nonsingular. 

Let  (I  -  L)_1U  =  C  and  (I  -  L)_1b  =  a 


and  obtain 


k+1  „  ,  n  k 

x  =  a  +  Cx 


=  Ckx°  +  (I  +  C  +  C2  +  ...  +  Ck  1)  a 
where  x°  is  an  arbitrary  vector. 

As  before  if  (I  -  L)-1U  =  C  has  eigenvalues  such 
that  [  Ai  |  <  1  ,  all  i,  we  have  by  Theorems  1  and  3  that 

k+1 


x 


(I-C)  1a  -  A  1b. 


§  2.64.03  Example  I 

Find  x-^j  x^  for 

.lx^  +  lx^  =  1 
lx1  +  2x2  =  1 

Now  A  =  (  l  g  )  >  I_Ij  =  (  i  g  )  ’  U  =  (  0  0  ) 

(I-L) -1=  (jt  )  ,  C  =  (X-L)  U  =  (  °  ),  a  =  (I-Lr^+o). 

Now  C  has  eigenvalues  0,  .5  and  therefore  the  Gauss  Seidel 

process  will  converge  for  arbitrary  b. 


.  ...  \  ,'r.:  .;  .  ' 

d  ■•y.'"!  "  Id'  ei l  ■■  "■  /■■■  ■ 


Let 


0 


-  p  s  - 


b  -  ( 
( 


1 
1 

0  -1 

o  .5 


Then  x^  =  a  +  Cx; 


( 


x 


=  a  +  Cx1  =  ( 


1 

0 

1 

0 


\  o 

)  j  * 

)  ,  a,  =  ( 
-  1 


)  +  ( 
)  +  ( 


(  V  ) 


0 


)  • 


.5 

-.5  ) 

.25 


)  =  (  2  ) 


(  1_o:2) 


k  ,  1-2 
x  =  ( 


-k+l 


-k 


)  so  that 


x 


k 


(  o  )  as  k' 


§  2.65  Gradient  and  Conjugate  Gradient  Iterative  .Methods 


Again,  we  consider  the  linear  system  Ax  =  b 
where  A  is  a  real  symmetric  positive  definite  matrix  whose 
inverse  A  1  is  therefore  known  to  exist.  b  and  x  are 
column  vectors;  the  former  is  given,  and  it  is  required  to 
calculate  the  elements  of  the  unique  solution 

x  =  A-1b . 


We  turn  now  to  a  different  approach  and  consider 
the  iterative  methods  which  require  the  quadratic  functional 

f(x)  =  (x-A  ^b)^  A(x-A_bb) 

m  m  T  -1 

=  x  Ax  -  2x  b  +  b  A  b. 


The  equations  f(x)  =  constant  represent 
and  similarly  situated  ellipsoids  with  center  at  A 
Clearly,  f(x)  takes  its  minimum  value,  zero,  at  x  : 


similar 


-1 


A_1b 


Any  estimate  x^  of  A  b  defines  a  point  in  n-space 
and  also  the  ellipsoid  through  this  point.  Approach  to  the 
center  is  represented  by  the  sequence  of  vector  displacements 


k+l  k  A  k  ,  k 
x  -x  =  Ax  =  t^p 


where  tic  is  a  scalar  and  pk  a  vector.  We  define  a  residual 
vector  rk  as  the  inward  pointing  normal  at  xk  to  the  ellip¬ 
soid  through  xk  since 


rk  =  b-Axk  =  -_1  Vf(xk). 
2 


' 


■  ■  ■  1  'i 1  ■" ::  . 


The  direction  defined  by  r  Is  that  direction 
for  which  f(x^)  decreases  most  rapidly  and  hence  is  known 
as  the  direction  of  optimum  gradient  or  steepest  descent. 
An  iteration  which  is  defined  by  a  recursion  of  the  form 


x 


k+1 


k 

=  x  + 


k 

2 

i=o 


c,  .  r' 
ki 


is  called  a  gradient  method. 

In  the  succeeding  sections  we  consider  two 
variants  of  the  gradient  method. 


§  2.63.01  The  Gradient  Method 

k  /  k\ 

The  direction  at  x  =  x  for  which  f(x  )  decreases 

k 

most  rapidly  is  given  by  the  negative  gradient  r  .  We 

continue  in  this  direction  of  steepest  descent  until  we  arrive 
k+1 

at  the  point  x  which  satisfies  the  condition 

/  k+lxT  k  n 
(r  )  r  =  0  . 

Using  the  definition 

k+1  k  ,  k  , 

x  -  x  =  t^p  ,,  and 

r^  =-  b  -  Ax^  j  we  obtain 

k+1  k  ,  A  k 
r  =  r  t^Ap 

k 

Further,,  the  optimum  value  of  t^.  for  the  direction  p  is 


.0 

'k 


/  k\T  k//  kxT.  k 
(p  )  r  /(p  )  Ap 


Alternately,  differentiation  shows  that 


r,  {  k  ,  ,  ,  d.  f  r.  n.  (  tv\  x  tv  JV.\ 

f(x  +  t ,  p  )  =  t,  (p  )  Ap  -  2t,(p  )  r  +  f(x  ) 


kN 


.2/  kNT„  k 


k\  T  k 


k> 


k1 


k 


k 


takes  its  minimum  value  for  t^  . 


J  ,  sb  '■-fid'  J 


The  particular*  iteration  defined  by 


k+1 

x 


t 


k 


k 


is  the  optimum  gradient  or  steepest  descent  method  of 
Temple  (1939) .  Unfortunately,  its  convergence  is  usually 
too  slow  for  the  method  to  be  of  practical  value.  This 
method  is  of  interest,  however,  since  it  serves  as  the  basis 
of  a  large  class  of  iterative  methods. 


§  2.65.02  Conjugate  Gradient  Method 


A  better  method  based  on  the  knowledge  that  the 
center  of  an  ellipsoid  lies  on  the  plane  conjugate  to  a 
given  chord,  known  as  the  conjugate  gradient  method,  can  be 
most  easily  described  by  the  following  computing  algorithm. 


k 

Is  satisfied  for  all  k,  i.e.,  the  directions  p  are  A- 
con jugate  to  each  other.  In  theory  the  process  terminates 
after  n  steps  but  the  inevitable  rounding  errors  in  a 
practical  computation  perturb  the  strategy.  A  more  extens¬ 
ive  treatment  of  this  subject  may  be  found  in  Hestenes  and 
Steifel  (1952)  and  Martin  and  Tee  (1961) . 
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CHAPTER  III 

TECHNIQUES  FOR  SOLVING  NONLINEAR  SYSTEMS 
§  3.1  Introduction 

The  problem  of  solving  systems  of  nonlinear 
equations  has  been  relatively  neglected  in  the  mathematical 
literature.  The  nature  of  the  solutions  and  the  possible 
pitfalls  in  any  general  method  now  available  need  more  care¬ 
ful  attention  and  elucidation. 

For  any  linear  system  of  the  form 
r  =  b  -  Ax  ;  A:  nxn  ;  r,  b,x:  nxl  , 
there  is  a  corresponding  nonlinear  problem  of  obtaining  the 
solutions,  if  any  exist,  or  the  level  points  of  the  test 
function  T(x)  =  r^  C  r  ;  C  :  nxn  , 

positive  definite.  The  single  function  T(x)  will  have  a  solu¬ 
tion  and  vanish  there,  if,  and  only  if  x  =  a  ,  where  a  is  a 
solution  of  the  linear  system.  The  properties  of  the  original 
linear  system  are  closely  linked  with  the  properties  of  the 
Hessian  matrix 

S2t 

H  =  [  hx.Sx.1  ’ 

1  J 

which  has  constant  elements.  If  |  H  |  /0,  a  unique  level  point 
of  T(x)  will  be  a  solution  if  the  initial  system  is  consistent. 
If  |  H  |  =0  ,  there  will  be  a  continuous  locus  of  level  points 
that,  depending  on  definition,  can  be  treated  as  infinitely 
many  solutions  or  as  no  solution. 

In  the  nonlinear  as  in  the  linear  case,  the  Hessian 
matrix  H  -  in  general,  a  function  of  the  independent  variables 


■ 

■ 

' 


' 


. 


. 

. 
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x-^,  x0,  .  ..,  xn —  of  a  test  function  T(x)  ;  x  :  nxl,  generally 
characterizes  the  nature  of  the  solution;  in  the  nonlinear 
case,  however,  there  are  additional  problems.  The  test 
function  T(x)  most  commonly  used  in  connection  with  the  non¬ 
linear  system  f(x)  =  0  ;  f(x)  :  nxl,  is  defined  as  T(x)  = 
(f(x))  f(x).  The  nonlinear  test  function  T(x)  may  have  a 

unique  level  point;  but  it  is  more  likely  that  several  level 
points  or  relative  level  points  may  exist.  Level  points  may 
be  distinct  or  finitely  confluent,  may  occur  in  continuous 
sets  or  any  combination  of  these  situations  may  exist  in  a 
single  problem.  For  level  points  which  are  distinct  we  have 
|  H  |  /0  in  some  open  region  containing  each  level  point.  If 
|  H  |  =0  we  have  a  confluent  set  of  points  and  if  |  H  |  £  0 
there  may  be  a  set  of  nearly  coincident  level  points. 

Milne  (1949)  has  described  a  process  by  which  the 

difficulty  of  coincident  or  nearly  coincident  level  points  may 

be  resolved.  The  various  types  of  simple  level  points  can 

T 

be  distinguished  by  examining  the  quadratic  form  x  Hx  which 
is  positive  definitive  only  at  a  relative  or  absolute  minimum 
solution  of  T(x)  =  0. 

Once  a  solution  has  been  found,  say  x  =  a  ,  a 
further  solution  can  be  obtained  from  the  successor  problem: 

Tx(x)  =  T(x)/|  I  x  -  aQ  |  |  p 

where  |  |  x  |  |  is  a  suitable  norm  of  x  and  p  a  discrimination 
factor.  Since  it  is  impossible  to  determine  an  infinite 
number  of  solutions,  limits  on  either  the  number  of  solutions 
or  the  range  of  their  locus  are  always  required. 


•  alb  ; 


, 
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The  techniques  reviewed  in  this  study  of  the 
nonlinear  problem  usually  require  that  T(x)  and  the  gradient 
vector  Vt(x)  he  calculated  at  the  set  of  points  x^,  x^  , 

.  .  . ,  xnlv  j  from  the  behaviour  of  the  function  at  this  point 
in  n-space,  we  obtain  information  concerning  the  location 
of  possible  solutions.  The  test  points  are  chosen  in  a 
sequential  manner  by  a  fixed  set  of  operations.  The  most 
recent  literature  on  this  topic  is  reviewed  in  Turner  (i960). 
Spang  (1962),  Todd  (1962),  and  Beckenbach  (1956). 

§  3.2  Definitions  and  Theorems 

§  3.21  Definitions  and  Notation 

§  3.21.01  The  nonlinear  system  is  defined  as  f(x)  =  0  ; 

6f . (x) 

f(x),  x  :  nxl .  The  determinant  |  iv  '  |  is  called  the 

Jacobian  or  Functional  Determinant  of  the  functions  f^,  f^, 

.  .  . ,  f  with  respect  to  the  variables  x-,  ,  x0,  .  .  . ,  x  and 

’  n  ^  1J  2J  ’  n 

is  denoted  by 

J  =  •  •  .>fn)  =  3f1(x) 

c)(X1,Xg,  .  .  .  jXn)  ClX  ■ 

6f  (x) 

The  Jacobian  matrix  J  =  [_i_ _ '_]  will  also  be  referred  to 

„  ""d  x. . 

as  the  Jacobian.  J 

§  S . 21 . 02  The  Jacobian  of  the  first  differential  coefficients 
of  a  function  of  n  variables,  taken  with  respect  to  the  sev¬ 
eral  variables,  is  called  the  Hessian  of  the  function. 


, 

' 

. 

■ 

J- 


Thus  if 


T  =  T  (x1,  x2. 


.... x  ) 

*  n' 


> 


V  3T  3t  3t  ^  p 

'  -5x  ’  ’ ' '  ’  '3x  '  51 2t 

then  H  =  - - - - - —  =  [ -  ]  is 

x2,  ...,  xn)  3xi9xj 

the  Hessian  of  the  function  T. 

§  3.21.03  a  is  an  exact  solution  of  the  nonlinear  system 
f(x)  =  0  ;  f(x),  x  :  nxl,  if  f(a)  =  0  ;  xk  is  an  approximate 
solution  if  f(x^)  £  0  . 

§  3.21.04  The  gradient  vector  of  T(x),  where  T(x)  is  a 

function  in  n  variables  x-p  .  .  .  ,  x  ,  is  given  by  : 

g(x)  =  VT(X)  =  (-^  |Xg,  ...,^)T(x)  ;  g(x)  :  nxl  . 

§  3.21.03  A  homogeneous  polynomial  of  the  second  degree  in 

several  variables  x-^,  .  xn  is  called  a  quadratic  form. 

Any  real  quadratic  form  can  be  written  as: 


2 

1,  k  =1 
T 

notation  as  x  Ax 


a . ,  x.  x,  ,  where  a . , 
lk  1  k  *  lk 

T 

where  A  =  A. 


a 


ki 


or  in  matrix 


§  3.21.06  A  function  in  n  variables  denoted  by  C(x)  ; 

1  ■  2  1  2 

x  :  nxl,  is  convex  in  a  domain  D  if  for  x  ,  x  (x  , x  :nxl)eD 


and  any  t,  0  <  t  <  1  we  have 


•• 

■ 

■ 


' 
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A  function  is  strictly  convex  if,  for  those  values  of  t, 
the  <  sign  can  always  be  replaced  by  a  <  sign. 

§  3.22  Theorems  and  Lemmas 

§  3.2.01  Theorem 

The  necessary  and  sufficient  conditions  that 
C.(x)  =  0  have  a  solution  at  a  is  that  g(a)  —  VC  (a)  =  0 
and  C(x)  is  a  convex  function^  Curry  (1944). 

§  3.22.02  Lemma 

A  positive  semidefinite  quadratic  form  is  convex 
and  a  positive  definite  form  is  strictly  convex. 

Proof :  Let  f(x)  =  xTAx  ,  where  A  =  AT. 

Now  f(tx^  +  ( l-t )  x2)^  ^  (x''- )  TAx2+2t  (l-t)  (x^)TAx2+(l-ty~  (x2)TAx2, 
and  tf(xX)  +  (l-t)f (x2)=t(x1)TAx1+(l-t) (x2)TAx2 

Hence  tf(x^)  +  (l-t)f (x2) -f ( tx^+( l-t) x2) =t (l-t) (x1-x2)TA(x2-x2) . 
§  3.22.03  Lemma 

A  square  and  symmetric  matrix  A  is  positive 

m 

definite  if  (s  )As>0  holds  for  s^O,  positive  semidefinite  if 
(s^)As  >  0  for  s>0. 

§  3.3  Simple  and  Gauss-Seidel  Iteration 

The  iterative  methods  which  solve  the  linear 
problem  Ax=b  can  often  be  extended  to  solve  the  nonlinear 
system  f(x)=0  ;  A:nxn;  x,  b,  f(x)  :  nxl . 


•  ••  . 

. 


«  A 


' 


$  3.31  Simple  Iteration 


If  the  initial  nonlinear  system  F(x)  =  0  ; 

F(x)  :  nxl,  can  be  written  in  the  special  vector  form 
x  =  f(x)  ;  f(x)  :  nxl  we  can  construct  a  sequence  of  vectors 
by  the  iterative  equation: 


xk+1  =  f(xk)  ,  x°  arbitrary. 

The  conditions  required  to  ensure  convergence  to  a  solution 
are  that  xk£  a,  where  the  degree  of  proximity  required  depends 
on  the  functions  f^(x)JJ  .  .  . ,  fn(x),  and  all  the  eigenvalues 
of  the  Jacobian  J(a)  have  moduli  <  1  .  A  resonable  approxima¬ 
tion  to  this  eigenvalue  problem  Is  given  by 

aq 

k 


A  max 

i 


i  2  qii  (x)  ,  i  =  1,  2,  . . . ,  n. 

k=l  0xn 


The  example  of  the  next  section  illustrates  this  point. 


§  3.32  Example  of  Simple.  Iteration 

Let  the  problem  be  of  the  general  form 

(X)  =  ^  (x,y) 
y  f2  (x,y)  . 

We  consider  the  particular  case 

x=0 . 2y3-0 . 6y2-0 . 6x2y-0 . 8xy+0 . 6x2+l . 2x+0 . 2y+0 . 4 

y=0 . 2x^+0 . 4x2-0 . 6xy2+l . 2xy-0 . 4y2+l . 2y-0 . 2x+0 . 8 . 


J(x,y)=( 


-1 . 2xy-0 . 8y+l . 2x+l . 2 


0 . 6y2-l . 2y-o . 6x2-0 . 8x+0 . 2 


0 . 6x^+0 . 8x-0 . 6y^+l . 2y-0 . 2  -1 . 2xy+l . 2x-0 . 8y+l . 2 


j(o.2,i.3)=(o;i!;  'oil)  ;  1  A  max|  -  °-6  • 


■ 

■ 

I  • 

" 


:  ■ 
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x1=0.2(1.3)“-0.6(1.3);'-0.6(0.2),‘1(1.3)-0.8(0.2)  (l .  3)  +0 . 6(0 . 2) 2 
+1.2 (0.2) +0.2 (1.3) +0.4=0. 11  , 

y1  =  1.77  ^  and  so  on.  We  obtain  successively 


k 

k 

X 

k 

y 

0 

0.2 

1.3 

1 

0.11 

1.77 

2 

-0.056 

1.680 

3 

-0.002 

1.682 

4 

-0.0094 

1 . 6872 

5 

-0.0086 

1.6849 

6 

-0.0085 

1.6853 

Zaguskin  (1961). 

§  3.33  Gauss-Seidel  Iteration 


As  In  the  method  of  simple  iteration  we  construct 
a  sequence  of  vectors  by  the  iterative  equation: 


k+1  _  /  k+1 

x±  - 


k+1  k  k\  .  t  n 

.  ,x±_1,x±,  .  •  •  >  x^ )  ,  i-1  j  2j  .  .  .  j  rij 


where  the  notation  implies  that  the  (k+1)  iterative  components 
are  used  as  they  become  available.  Speed  of  convergence  is 
often  faster  for  Gauss-Seidel  iteration  than  for  simple 
iteration,  Zaguskin  (1961). 

§  3.4  Gradient  Methods 


The  most  important  methods  available  for  solving 
the  nonlinear  system  are  the  "gradient  methods"  of  minimiza¬ 
tion.  These  methods  use  measurements  of  the  slope  of  the 
function  as  indicators  of  the  direction  toward  the  desired 
solution.  The  successive  approximations  to  the  solution  are 


' 
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determined  by  the  iterative  equation: 

=  x^  +  t^  Ax* 1^,  where  t^  determines  the 

step-size  and  Ax1<;  the  direction.  The  various  gradient 
methods  (cf.|.  2.65)  are  dependent  upon  the  particular  choice 
of  t,  and  Ax1^  Spang  (1962).  The  first  approximation  is 
arbitrary  but  should  be  picked  as  close  as  possible  to  the 
true  solution.  In  practice,  the  slope  of  the  function  can 
usually  be  approximated  numerically.  The  most  recent 
computational  experience  using  gradient  and  conjugate 
gradient  methods  are  available  in  Rosenbrock  (i960),  Martin 
and  Tee  (1961),  and  Powell  (1962) . 

A  recent  trend  in  gradient  methods  stresses  sim¬ 
plicity  of  coding  for  a  stored  program  computer;  in  particular, 
the  calculation  of  derivatives  tends  to  be  avoided.  The 
extremes  of  these  almost  purely  experimental" downhill"  methods 
are  available  in  Ward  (1957)  and  Lance  (1959)  and  are  review¬ 
ed  in  this  chapter. 

# 

§  3.41  Univariate  or  Relaxation  Methods 


The  simplest  and  crudest  gradient  method  is 
obtained  by  varying  only  one  coordinate  component  per  itera¬ 
tive  step.  The  particular  coordinate  can  be  chosen  in  a 
cyclic  or  any  arbitrary  manner.  One  method  is  to  use  the 
coordinate  defined  by 

max 

i 


6T(xk) 


/6x. 


i  =  1,  2, 


n, 


A  slight  modification  of  the  above  method  is  the 

Southwell-Synge  method,  discussed  in  Booth  (1949)*  where  the 

coordinate  x .  is  chosen  by 
J 


(  6T  /*\  \  2  /  '\2m  /-v  2 

V  /dxj  /2d  T/ox  . 


max 

I 


(|i)2/2-^2 


^3 


,  i=l, 2, . . . ,n, 


In  both  cases  the  step-size  t^  can  be  chosen 
from  the  truncated  Taylor  series  for  T(x^+^)  about  the  point 


. 
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x  .  Thus, 

4.  c>T(xk)  /  a2T(xk) 

+  “  IXj  /  <U§  /  • 

We  note  here  that  the  second  derivative  must  be  positive 
§  3.^2  Newton's  Method 


The  extension  of  Newton's  Method  to  the  non¬ 
linear  system  f(x)  =  0  ;  f(x),  x  :  nxl  gives  the  iterative 
process : 


k+1  k 
:  =  x 


J,  f^f  (xk)  ,  where  J,  =  )  ]  # 


k 


k 


Newton's  method  for  the  nonlinear  system  is  usually  written 
in  the  form: 

Jk(xk+1-Xk)  =  -  f(xk)  . 

The  problem  of  minimizing  the  test  function  T(x) 
can  be  written  as  an  iterative  equation  in  a  form  appropriate 
to  Newton's  method  as: 

tj  (  k+1  k\  ^  „  rB2T(xk)  1 

H  (x  -x  )  =  -VT  ,  where  H  =  [ . -^-A— '  J  . 

i  J 

Convergence  in  the  large,  i.e.,  convergence  to 
a  solution  from  an  arbitrary  starting  point,  is  not  assured. 

A  starting  point  symmetrically  placed  between  two  solutions 
will  oscillate  between  the  two  solutions  and  never  converge 
if  no  rounding  error  is  introduced.  To  prevent  initial 
overshoot  -  a  common  occurrence  -  step  limitation  is  often 
desirable.  A  nonconvergent  sequence  can  sometimes  be  overcome 
with  a  restart  at  a  different  initial  approximation. 


o tt  1  .1  TiO  i‘.!.e  /•-.  i£>nno  .  on  .  : 


Experience  suggests  that  Newton's  method  should  be  considered 
only  for  problems  of  limited  scope.  Turner  (i960).  A  discus¬ 
sion  of  Newton's  method  is  also  available  in  Haselgrove  (1961). 


§  3.42.01  First  Modification  of  Newton's  Method 

If  we  replace  =  Jix^)  by  JQ  =  J(x°)  in 
Newton's  method  we  obtain  the  following: 


k+1  k  T  -lw  kx 
X  =  X  -  J  f  (x 

This  iteration  is  computationally  easier  per  step  but  the 
speed  of  convergence  is  decreased.  It  can  be  shown  that,  if 
the  initial  approximation  is  sufficiently  close  to  the  exact 
solution,,  such  a  procedure  will  converge  for  a  large  class  of 
functions  f(x)  =  0.  This  modification  gives  satisfactory 
results  for  problems  in  one  or  two  variables  but  the  extension 
to  more  variables  will  probably  prove  to  be  unsatisfactory. 


§  4.42.02  Second  Modification  of  Newton's  Method 


This  method  is  essentially  equivalent  to  the  first 

modification  of  Newton's  method  but  differs  in  that  a  sequence 

1  2 

of  correction  vectors  X  ,  X  ,  ...  are  constructed  instead  of 

the  previous  sequence  of  approximate  solution  vectors  x\ 

2 

x  y  ...  . 

The  iteration  equation  is  given  by: 
f(x°+X)  =  f(x°)  +  JQX  +  g(X)  =  0  , 

f(x)  ;  X,  g(X)  :  nxl,  where  g(X)  is  defined  by  the 
above  identity.  We  may  rewrite  our  iterative  procedure 


' 
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as : 


X  =  Y  +  G(x)  ,  where 


Y  =  °)  ,  Q(X)  =  -j;1  g(X)  , 


1  2 

We  then  find  successive  correction  vectors  X  ,  X  ,  by  means 


of  the  formulas : 


X 


1 


Y 


Xk+1  =  Y  +  G(Xk)  . 


Beckenbach  (1956) .  An  example  illustrating  the  method  is 
given  in  §  3. 42.04. 

§  3. 42. 03  Example  of  Newton's  Method 

Let  the  nonlinear  system  and  initial  approxima¬ 
tion  be  given  as 


where  f (y, z) =y^+2y2-3yz2+6yz-y-2z2+z+4. 


g(y, z)=-z3+3z2+3y2z+4yz-z-3y2-y-2  . 


Now 


df  df 

j=  [  ^  ^  j  =  t 

6g  6g 

"5y  ‘5z 


3y2+4y-3z2+6z-l, -6yz+6y-4z+l 
6yz+4z-6y-l  s  -3z2+6z+3y2+4y-l 


-2 . 6l6 


0.040 


)  • 


■  -  •  .  \n.  'r/r  •' 
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Our  linear  system  is 

-7.920  Ay  +  9.800  az  =  2.616 

-9. 800  Ay  -  7.920  Az  =-0.040  „  which  has  the 

solution  Ay  =  -0.12803*  Az  =  0.16347.  The  second  approxima¬ 
tion,,  using  the  iterative  equation 

1  o  ,  A  o  , 
x  =  x  +  Ax  ,  becomes 

x  0.27197 
X  =  ^-0.83653^  ‘ 

Repeating  the  process  we  obtain 
Q  0.24823 

X2  =  (  )  . 

-0.82096 

o  1  2 

Exponential  extrapolation  of  x  ,  x  „  x  gives  the  result 

o  0.24283  0 

x-3  =  (_  q  j  which  has  an  error  <  4x10  . 

§  3.42.04  Example  of  Second  Modification  of  Newton's  Method 


3.3 

(  ) 
-3.0 


Let  f(x)  and  x°  be  given  as 


f(y,z)  4+y+z-y2+2yz+3z2  y° 

f(x)  =  (  )  =  (  2  2)  =  0,x°  -  (  ) 

g(y,z)  l+2y-3z+y  +yz-2z  z 

l-2y+2z  ,,  l+2y+6z  -11.6  -10.4 

Then  J  =  [  ]  =  [ 

2+2y+z  ,,  -3+y-4z  ^  5.6  12.3 


X 


o 


-0.14566  -0.12316  0.61 

jf  =  [  ]  ,  f(x°)  =  (  )  . 

0.06632  '0.13738  -0.41 


o 


-1  n  0.0384 
Now  Y  =  -J  f(x°)  =  ;  )  =  X 

0.0159 


,  ■ 


obtained 


1  2 

The  successive  correction  vectors  X  ,  X  are  now 


0.0384 
X1  =  Y  =  (  ) 

0.0159 

GU1)  =  JQ~lg(xl)  =  -J0_1f  (V'X1)  = 
9  0.038627 

X  =  Y  +  G(X  )  =  (  ) 

-0.015419 

0  10  -0. 000005 

G(X^)  =  J0_1f(x  +X2)  =  (  ) 

0.000000 

o  p  0.038622 

X3  =  Y  +  G(X  )  -  (  )  . 

-0.015419 


0.000227 

( 

-0.000281 


Hence  our  final  answer  correct  to  5  decimal  places  is  given 


by  a  =  x 
0  o 


+  X 


3 


where 


a 


3.33862 

( 

-2.98438 


) 


An  important  advantage  of  the  second  modification 
of  Newton Js  method  is  that  we  can  increase  the  number  of 
significant  figures  in  successive  approximations  by  computing 
successively  diminishing  error  vectors. 


§  3.43  Methods  of  Descent 


T 

Let  us  consider  the  test  function  T(x)=(f(x))  f(x) 
in  connection  with  the  nonlinear  system  f (x) =0; f (x) , x:nxl . 
Clearly  T(x)  will  have  a  solution,,  if,  and  only  if  x=a,  where 
a  is  a  solution  of  f(x)=0.  From  elementary  vector  analysis 
we  recall  that  the  direction  of  maximum  decrease  in  the  value 
of  T(x)  is  given  by  the  negative  gradient  -VT(x). 

Let  x=x(t),  where  x(0)=x°  and  t>0,  define  a  curve 
in  n-space.  We  now  define  the  test  function  T(x)  as  G(t) 
where  every  permissable  t  defines  a  corresponding  point  on 
the  curve  x=x(t)  and  a  value  for  T(x).  If  we  follow  the 
curve  x=x(t),  t>0  defined  by  ^  =-VT(x),  x(0)=x°  ,  we  will 
at  least  momentarily  follow  the  path  of  steepest  descent  and 
in  general  we  can  find  a  value  of  t>0  such  that 

G(t)  =  T(x°  -  tVT (x°) )  <  T(x°). 
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1  In  this  manner  ye  construct  a  sequence  of  points 

x  ,  x  , . . .  such  that  T(x vt  )  <  T(xk)  .  Under  suitable 
conditions  we  will  have  convergence  to  a  solution  of  T(x.)  . 
The  value  of  t  required  at  each  step  is  determined  either  by 
trial  and  error  or  by  finding  the  smallest  positive  root  of 
G(t)=0.  This  variation  of  the  method  has  the  geometrical 
interpretation  illustrated  by  the  following  diagram 


which  represents  the  section  of  surfaces  T(x  j  =  c,, 

T(xh-t-l)  _  cic+i  intersected  by  the  plane  through  x^contain- 
ing  the  directions  -VT(x^)  and  -VT(x^-+-)  .  The  surface  we 
seek  T(x^+-1-)  =  must  have  -VT(x^)  as  tangent  and  clearly 

the  subsequent  directions  of  steepest  descent  will  be  at 
right  angles  to  each  other;  i.e.*  we  have  arrived  at  that 
point  on  the  line  of  steepest  descent  where  the  derivative  of 
T(x)  with  respect  to  the  distance  along  the  line  is  zero. 

The  methods  of  "descent"  have  several  advantages: 
they  usually  converge  to  a  solution,,  they  allow  approximations 
to  be  used,  and  they  can  give  more  than  one  solution  if 
several  are  required.  The  recent  trend  in  descent  methods 
is  to  stress  simplicity  of  coding  for  a  stored  program  com¬ 
puter.  Ward  (1957)  and  Lance  (1959)  have  developed  methods 
in  which  the  level  points  are  sought  by  systematic  sampling 
techniques . 


§  3. 43.01  Method  of  Steepest  Descent 

We  now  consider  the  Method  of  Steepest  Descent  in 
greater  detail.  Let  the  test  function  T(x)  be  given  as 


T(x)  =  (f(x))*T(x)  ;  furthermore,  let  x°  be  an  initial 
approximate  solution.  Numerical  methods  for  determining  the 
minimum  of  T(x)  can  be  easily  devised  by  using  the  geometri¬ 
cal  picture.  In  the  neighbourhood  of  the  solution  T(x) 
represents  -  by  assumption  -  a  convex  surface.  Starting 
with  x°  sufficiently  close  to  the  solution  and  proceeding 
in  the  proper  direction  Ax°  along  any  straight  line  (except 
the  one  which  is  tangent  to  T(x)  =  constant)  we  can  always 
arrive  at  a  point  x ^  which  is  a  better  approximation  to  the 
solution. 

k+l  lr  ic 

Let  x  =  x  +  t^Ax  define  the  iteration.  If 
Axk  is  chosen  by  Axk  =  -  VT(x°)  we  have  the  direction  of 
steepest  descent.  The  optimum  t^  is  calculated  as  a  root 
of  the  equation  (in  t) : 

d  T(xk  -  tVT (xk) )  =  0  . 

"St 

In  the  calculation  of  t^  there  is  no  object  in 
striving  after  great  accuracy.  A  relative  error  of  several 
percent  should  not  greatly  affect  the  rapidity  of  convergence. 
The  solution  of  the  above  optimization  problem  may  be  effect¬ 
ed  in  a  satisfactory  manner  by  determining  the  first  point 
on  the  directed  line  segment  -  defined  by  an  initial  point 
xk  and  direction  -VT(xk)—  for  which  the  direction  of  steepest 
descent  at  that  point  is  perpendicular  to  the  given  line. 

More  explicitly:  let  the  directed  line  segmented  be  defined 
by  xk,  sk  =  -VT(xk)/  |  VT(xk)  |  ;  the  point  on  the  line  is 
given  as  xk+^  =  xk  +  t^sk  ;  and  the  condition  to  be  satisfied 
is  (s  )  s  =  0  .  The  desired  value  of  t^  can  be  determined 


,  ■  .  .  , 

X9  V 

.'■•.I- 

' 

i  ■  :  ■ !  : 
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very  satisfactorily  by  linear  Interpolation:  the  following 
approximate  value  determined  by  Newton's  Method  of  t^.  Is 
also  used  if  xk  ~  a: 


tk  ~  T(xk)  . 

( VT(xk)  )  iyT(xk) 

Again,,  we  illustrate  the  above  procedure  by  the 
following  example. 

§  3.43.02  Example  of  Steepest  Descent 

Let  f(x)  =  0  be  defined  as 
f1(x,y,z)  =  x2  +  y2  +  z2  -  9  =  0 
f2(x,y,z)  =x+y+z-3=0 
f^(x,y,z)  =  x  -  y  =  0 

m 

Our  test  function  T(x)  =  (f(x))  f(x)  becomes 

2  2  2 

T(x,y, z )  =  ( x2+y2+z ^ - 9)  +  ( x+y+z - 3 ) + ( x-y )  . 


The  direction  of  steepest  descent  is  given  by 


-  VT  (x,  y ,  z) , 


where  VT(x,y,z) 


4x^+4xy2+4xz2-32x+2z-6 
(  4y^+4yx2+4yz2-32y+2z-6  ) 

4z^+4zx2+4zy2-34z+2x+2y- 6 


Let  x°  =  (x°,y°,z°)  =  (0,0,1)  . 
We  shall  use  the  approximation 


,  ^  /  k  k  kN 

tk  ~  T(x*y  ) 


T 


(VT(xk,yk,zk)  VT(xk,yk,zk) 


for  the  step-size  problem. 


at 


.  .  I-,;  ■ .  i  ■  v;o  I 
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STEP  1 


T(0,0,1)  =  (-8)2+(-2)2  =  68 


VT(0,  0,1) 


-4 

(  -4  ) 
-36 


t  = 
o 


68 


(-4)2+(-4)2+(-36)2 


1  o 


Hence  xx  =  xu  -  toVT(x°)  gives 


68 


1328 


~  1 

'V  -1 

20 


x 


x 


-4 


0 


1 


STEP  2 


(y1-)  =  (y°)  -  1_  (-4  )  =  (0)  +  (i  )  =  (0.2) 

o  20  - 


0.2 


-36  1 

T(0.2,0.2,2.8)  £  1.2 


5 


VT(0.2,02,2.8)  =  -.46 

(  -.46  ) 

-11.70 

tn  ~  1.2  ~  1 

(-12)2  120 

2  1  1 
Hence  x  =  x  -  VT(x  )  gives 

2 


2.8 


0.2 


(y2)  =  (0.2)  +  t^)  (.46  ) 


2. 


.  46 
.  46 
11.70 


0.2 

(0.2) 

2.9 


STEP  3 


T  ( o .  2, 0 . 2, 2 . 9)  = 
VT(0.2,0.2,2.9)  = 


25  +  .09  =  .34 

0.19 
(  0.19) 

-5.32 


^2  ~ 


34/28.3  ~  1/80  . 


op  p 

Hence,  xJ  =  x  -  VT(x  )  gives 


0.2 


0.2 
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o 

x3 

0.2  0.2  0.2 

(y3)  = 

=  (0.2)  -  1/8 0  (0.2  )  =  (0.2  )  . 

z3 

2.9  -5.32  2.96 

STEP  4 

T (0 . 2, 0 . 2, 2 . 96)  £  .15 

VT(0.2,0.2,2.96)  ~  .58 

(  .58) 

-1.15 

t3  £  .15/2.01  ~  .075 

4  Q  O 

Hence,  x  =  x  -  t^  VT(x^)  becomes 


4 

X 

0.2  .58  .15 

/  4x 

(y  )  = 

4 

z 

(0.2  )  -  .075  (  .58)  =  (  .15)  . 

2.96  -1.15  2.97 

We  observe  that  the  rate  of  convergence  in  the 

region  of  the  true  solution  (0,0,3)  is  extremely  slow.  This 
example  illustrates  the  inherent  weakness  of  the  method. 

Hence,  in  the  region  of  a  stationary  value  other  methods,  e.g. 
Newton's  method,  is  perferable  to  the  method  of  Steepest 
Descent . 

§  3.44  Steepest  Descent  by  Integration  of  a  set  of  Ordinary 


Differential  Equations 

Let  us  at  this  point  sketch  in  quickly  an  approach 
which  follows  naturally  from  the  methodology  of  steepest 
descent.  The  nonlinear  system  f(x)  =  0  ;  f(x),  x  :  nxl,  can  be 
replaced  by  the  initial  value  problem 

dx  =  -  VT(x)  ;  x(0)  =  x°  . 

dt 
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The  above  system  can  be  solved  analytically 
only  for  very  simple  problems  but  numerical  Integration 
may  prove  to  be  satisfactory.  The  interval  size  and  the 
number  of  iterations  allowed  in  numerical  methods  before  a 
restart  is  required  will  in  general  depend  upon  the 
particular  problem,  Bechenbach  (1956).  Despite  its  obvious 
interest  this  approach  has  not  been  much  explored  and  its 
potential  value  remains  unassessed. 

§  3.45  Conjugate  Gradient  Method  of  Steepest  Descent 


In  many  ways  the  conjugate  gradient  method  is 
similar  to  the  method  of  steepest  descent  previously  review¬ 
ed  in  §  3. 43. 01;  however,  this  recent  method  is  quadratically 
convergent.  A  disadvantage  of  the  method  of  steepest  descent 
is  its  poor  convergence  near  a  stationary  value.  This 
difficulty  can  be  overcome  by  switching  to  another  method 
such,  as  Newton's  method  which  is  quadratically  convergent 
near  a  stationary  value.  Gradient  methods  which  have  second- 
order  convergence  for  finding  the  minimum  of  a  quadratic 
positive  definite  function  have  already  been  reviewed  in 
§  2.65  and  are  extensively  considered  in  Martin  and  Tee  (1952). 
A  method  -  in  some  ways  similar  to  the  conjugate  gradient 
method  of  Hestenes  and  Stiefel  (1952)  -  which  finds  station¬ 
ary  values  of  a  general  function  in  several  variables  is 
given  by  Powell  (1962) .  This  particular  method  which  has 
second-order  convergence  is  worth  outlining  in  some  detail. 

§  3.45.01  Basic  Assumption  of  the  Method 

Prom  the  general  nonlinear  problem 
f(x)  =  0  ;  f(x),  x  :  nxl  , 
we  construct  the  non-negative  test  function 

T(x)  =  (f(x))Tf(x)  . 

Let  G(x)  be  an  approximation  to  T(x)  in  the  neighbourhood 
of  the  stationary  value  a,  i.e., 

G(x)  =  T(a)  +  _1  (x-a)TH(x-a)  , 

2 


.  ■ 
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2 

where  H  =  [  cTT (a)/dx^dx  .  ]  .  Then  as  a  consequence  of  the 
definitions  and  assumptions,, 

dT  (a)  =0  i  =  1,2, ...,n. 


and  dT^(a)/dx^dXj  =  (a)/dxjdx^  . 

The  iterative  procedure  will  have  second-order  convergence 
if  and  only  if  the  iteration  leads  from  an  arbitrary 
estimate  of  the  stationary  value,  say  x°,  to  a  in  a  single 
cycle,  i.e.  when 

T(x)  ^  G(x)  . 

The  method  is  based  on  the  corollary  of  the 
theorem  that  if  G(x)  is  quadratic  in  the  independent  variables 
then  any  line  which  passes  through  a  intersects  the  members 
of  the  family  of  contours  G(x)  =  C  at  equal  angles.  The 
corollary  is  that  if  the  normal  at  t  j  t  :  nxl  ,  to  the 
contour  G(x)  =  G(t)  is  parallel  to  the  normal  at  t1;  t^:  nxl, 
to  G(x)  =  G(t  )  then  the  line  joining  t  to  t  passes  through 
a.  Before  we  proceed  to  define  the  procedure  in  the  general 
case  we  will  discuss  the  two-dimensional  case. 

§  3.43.02  Conjugate  Gradient  in  Two  Dimensions 

The  manner  in  which  the  iteration  proceeds  for 
the  two-dimensional  case  can  be  most  easily  described  with 
the  assistance  of  the  following  diagram*. 


srij  at 
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and  x1  is  a  point  on  the  line  defined  by  -VT(x°)  and  x°. 

The  point  x^  may  be  any  point  on  the  line  which  is  a  finite 

distance  away  from  x°,  however,,  if  it  is  chosen  as  the 

point  where  the  derivative  of  T(x)  with  respect  to  the 

distance  along  the  line  is  zero  then  the  convergence  of  the 

p 

process  is  assured.  The  point  x  is  now  obtained  in  a 

2  1  T  1  o 

similar  manner  subject  to  the  constraint  that  (x  -x  )  (x  -x  ) 

=  0  ;  i.e,  the  successive  directions  are  conjugate  to  each 

2  o 

other.  Clearly.,  a  lies  on  the  line  determined  by  x  and  x  . 

If,  as  is  usual,  T(x)/  G(x),  the  same  recipe  is 
used  for  each  cycle,  and,  in  general,  a  will  not  be  found 
by  a  single  iteration. 
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§  3. ^5. 03  Conjugate  Gradient  In  n-Dlmenslons 

The  previous  description  of  an  iterative  cycle 
in  two-dimensions  allows  us  to  formulate  the  general  pro¬ 
cedure  for  the  n-dimensional  case  in  a  simple  manner.  In 
the  n-dimensional  case  the  procedure  now  consists  of 
evaluating  the  gradient  at  x°  and  determining  a  point  x\ 
on  the  gradient  vector ,  at  which  the  derivative  of  T(x) 
with  respect  to  the  distance  along  the  line  is  zero.  Of 
course,  we  may  choose  x^  as  any  point  on  the  line  which  is 
a  finite  distance  from  x°  ;  however,  for  convergence  to 
he  assured,  we  choose  a  good  approximation  to  the  optimum 
distance.  The  second  step  of  each  iterative  cycle  consists 
of  determining  a  stationary  value  T(x)  in  the  (n-l)  - 
dimensional  hyperplane  which  contains  x ^  and  is  defined  by 
the  normal  line  x°-x\  Let  this  stationary  value  be  denoted 
as  x2n~2 .  Since  the  normal  to  T(x)  =  T(x2n_2)  at  x  =  x2n-2 
is  parallel  to  the  normal  to  T(x)  =  T(x°)  at  x  =  x°  the 
required  stationary  value  in  n-space  will  be  that  point  on 
the  line  joining  x  to  x  where  the  derivative  of  T(x) 

with  respect  to  the  distance  along  the  line  is  zero.  In  the 
process  described  the  steepest  descent  at  a  point  has  to  be 
evaluated  first  in  n-dimensions,  then  in  (n-l)  -  dimensions, 
...,  and  finally  in  one  dimension.  Also,  on  (2n-l) 
occasions,  knowing  the  derivatives  at  a  point  on  a  line 
we  have  to  calculate  the  point  where  the  derivative  with 
respect  to  the  distance  along  the  line. is  zero.  We 
recognize  that  the  first  (n-l)  times  in  a  cycle  this  is 
done  It  is  not  essential,  while  it  is  essential  to  find 


the  stationary  value  the  last  n  times.  The  numerical 
results  available  in  Powell  (1962)  suggest  that  this  con¬ 
jugate  gradient  method  is  much  more  powerful  than  that  of 
steepest  descents;  furthermore,  since  it  can  be  programmed 
without  too  much  difficulty,  it  is  very  suitable  for 
electronic  computing  calculations. 

§  3.43.04  Example  of  the  Conjugate  Gradient  Method 
The  following  example  defined  as 
T(x)  =  (x-L+X0x2)2  +  5(x3-x1|)2  +  (x2-2x3)il  +  10(x1-xit)it 
x°  =  (3, -1,0,1) 

was  chosen  in  order  that  the  conjugate  gradient  method 
could  be  compared  with  the  work  of  previous  authors. 

Table  I  Comparison  of  Results 


k 

steepest 

descent 

Booth  1 s 
variation 

conjugate 

gradient 

0 

215.000 

215.000 

215.000 

7 

6.355 

5.352 

0.009 

14 

3.743 

0.620 

9x10" 5 

21 

2.269 

0.135 

-  6 

2x10 

28 

1.420 

0.051 

2x10" ^ 

35 

0.919 

0.009 

lxlO"6 

42 

0.614 

0.008 

5xlO"8 

49 

0.423 

0.008 

4x10" 9 

We  note  that  in  four  dimensions  the  conjugate 
gradient  method  requires  a  minimum  to  be  calculated  in 


8o  - 


(2n-l)  or  7  distinct  directions  in  order  to  complete  a  cycle 
and  hence  the  comparison  is  as  above,  Powell  (1962) . 

The  second  iteration  is  as  follows. 

Table  II  The  second  Iteration  of  the  Method 


point 

X1 

X2 

X3 

x4 

T  (x) 

A 

0.1266 

-0.0123 

0.1455 

0.1428 

.00852 

B 

0.1263 

-0.0104 

0.1337 

0.1441 

.00699 

C 

0.1260 

-0.0124 

0.1333 

0.1435 

.00659 

D 

0.1259 

-0.0111 

0.1331 

0.1391 

.00632 

E 

0.1229 

-0.0107 

0.1332 

0.1392 

.00632 

F 

0.1229 

-0.0107 

0.1332 

0.1392 

.00632 

G 

0.1166 

-0.0114 

0.1323 

0.1303 

.00583 

H 

0.0396 

-0.0044 

0.0300 

0.0332 

.00009 

The  reason  that  the  conjugate  gradient  method 

does  not  have  second-order  convergence  in  this  case  is  that 

2  2 

T(x)  =  (x-^+lOx^)  +  5(x-„>-x^)  near  the  minimum,  hence  it  does 

not  determine  the  minimum  uniquely. 

In  the  second  iteration  the  first  and  second 
steps  from  A  to  B  and  B  to  C  correspond  to  the  ordinary 
steepest  descent  method;  the  steps  from  C  to  D  and  D  to  E 
correspond  to  steepest  descent  in  two  and  one  dimensions, 
respectively.  Since  E  nearly  coincides  with  D  -  and  this  is 
a  common  occurrence  -  the  minimum  on  CE,  F,  nearly  coincides 
with  E .  The  finding  of  G,  the  minimum  on  BP,  gives  a  slight 
improvement  in  T(x),  and  the  final  step,  finding  H  on  AG-, 


O' 
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reduces  the  value  of  T(x)  handsomely. 


§  3* ^5-05  Computational  Considerations 


The  numerical  computations  in  the  construction 
of  the  conjugate  gradient  vectors  can  be  effected  in  a  satis¬ 
factory  and  efficient  manner  by  using  projection  matrices  - 
defined  in  §  5. 21.07.  We  define  Q  as  the  intersection  of  q 
linearly  independent  hyperplanes  in  n-space  and  as  the 
projection  matrix  which  takes  aryvector  in  n-space  into  the 

intersection  Q.  Let  us  define  the  successive  hyperplanes 

T 

required  in  the  conjugate  gradient  method  by  n^  x  =  b^  „ 

T 

normalized  so  that  n^  n.  =  1,  i  =  1,  2,  ..0  q.  We  have,, 
therefore j 

Q  :  Nq  x  -  b  ,  Nq  -  [n^  n2,  ....  n  ]  :  nxq  , 

m  __  1  m 

and  P  :  I  -  Nc  (N  I\P  )  N  ,  where  I  :  nxn  unit  matrix, 
q  n  q  q  q  q  n 

Since  the  successive  hyperplanes  are  conjugate  to  each  other 
the  successive  projection  matrices  can  be  obtained  recursively 
from 


P  =P  -  n  n  T,  p  =  I 
q  q-1  q  q  o  n 

At  the  q-th  stage  of  each  iterative  cycle  we  require  both  the 
direction  of  steepest  descent  in  (n-q)  -  dimensions  and  the 
optimum  step-size  in  that  direction.  Clearly.,  the  direction 
of  steepest  descent  under  these  constraints  is  given  by 


dq+1  =  P  V  T(xq)  ,  q=0,  1,  ....  n-1,  P  =  I  . 

A  suitable  numerical  method  of  obtaining  the 
optimum  step-size  along  dq+^  is  the  method  of  linear  inter¬ 
polation  or  the  method  of  bisections.  Computing  experience 
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indicates,  however,  that  the  point  obtained  by  linear 
interpolation  be  subsequently  checked  for  accuracy.  The 
following  computing  algorithm  describes  a  method  which  has 
been  found  to  be  successful. 

Given  :  (PR;  tR;  xk)  ;  Find  :  (Pk+1Jtk+1;  xk+1)  ; 

where  P,  ,  P,  . ,  :  nxn  :  xk,  xk+^  :  nxl  :  t,  ,  t,  ,  n  :  scalars, 
k  k+1  k  k+1 


73  ;  tk 

t2«-  1.0 

t  0.0 
xtemp<-  xk 


conditional 

branch 


comments 

,  |  k 

tk  =  I  x  -x 


k-1 


9408  ;  h-^  t 

t  t+J5 

h  ta 

086  ;  x  <-  xtemp-txs 

g  <-  -VT  (x) 
stemp  g/  |g  | 

71  ;  tg  s^stemp 

I  tg|  :  e  ,  $*540 

t  :  0  ,  -^9408 

t  <-  t+Je(t2/t1-t2) 

i  h2'hl 


t  ~ 


k+1  k 

x  -x 


~  k+1  ,  k\  / ,  /  k 

X  ^  X  ,  s=g ( X  )/  |g(x 


is  |  g(x)Tg(xk) |  £  0? 

is  t  too  large? 

linear  interpolation 
for  new  t 
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— * 

086 

4540  J  tk+1 

— 

t 

,  |  k+1  k  | 

tk+1=  1  X  -x  1 

k+1 

X 

- 

xk+tk+is 

Pk+1 

- 

Pk-sst 

another  constraint  added 

END 

In  §3. 45.06  we  incorporate  the  above  in  the  basic  computing 
algorithm. 

At  the  end  of  each  stage  for  the  first  (n-l) 
stages  we  add  another  constraint  or  hyperplane  to  our  projec¬ 
tion  matrix;  at  the  end  of  the  (n-l)-st  stage  we  have  the 
problem  of  determining  the  direction  of  steepest  descent  in 
n-space  which  lies  on  (n-l)  hyperplanes.  The  following 
sequence  is  then  available  :  x0,  x\  xn. 

We  now  have  the  problem  of  calculating  the 

stationary  value  x  m  the  direction  x  -x  ,  x  m  the 

.  n-3  n+1  2n-l  ,,  .  o  2n-2 

direction  x  -x  ,  .  .  . ,  x  m  the  direction  x  -x 

2n- 1 

x  is  the  final  value  in  each  iterative  cycle.  The  opti¬ 

mum  step-size  can  again  be  determined  by  some  method  such 
as  bisection  or  linear  interpolation. 

§  3.43.06  The  Basic  Computing  Algorithm 

The  following  computing  algorithm,  has  been  test¬ 
ed  on  the  1620  IBM  electronic  computer  for  the  nonlinear 
system  defined  by: 


r 

-•  ;  -j)  no  C  i i ■:>  trui  j.sqs-n' 
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x-^+10xn 

f  (x)  =  (/!>(  x.-x4)  ) 


x2-2x, 

v/l0(x1-x4) 


3 

x°=(-l  )  ;  n=4, h=l . 0, e=0. 1, el  =0.001; 
0 
1 


P, I  :nxn; xm:nx(n+l) ;g,  s, stemp, x, xtemp, f :nxl. 


comments 

Po=I4 


5408;  P  4-  In 

i  <-  1 
t  4-  1.1 

h2<-  h 

l  -  2 

7;  g  <-  - VT (x) 

l  :  2  ,6 

s  «-  Pg 

g  <-  s 

,  T 

t2  *-  g  g 

P  -  P-ssT/t2 
9;  t2<-  I  g  I 

stemp  4-  g/t2 
i  :  2 

s  «-  stemp 

xm  ( i )  «-  x 

73;  «-  h2 

t  .g  1 . 0 


value  later  req'd  to 
end  conjugation 


T(x)=(f(x))Tf(x) 
is  g  temporary 
defines  directed  line 

defines  new  projec¬ 
tion  matrix 


,171,173 

defines  unit  vector 
on  directed  line 

x1  stored  for  later 
use 

linear  interpolation 
begins 

see  §3.45.05 


.3  ->•  <  '3; 
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h. 


0.0 


9408; 


086; 


71; 


xtemp  «-  x 
hi  -  h2 
1^2  ■*“  ^2+^3 

h-  *2 

x  <-  xtemp-h2xs 

i  <-  1 
->7 
t^  <- 


T 

s  stemp 


e 

0 


h 


t2 


2-  hg+h3(  ^ 


tlt2 


h 


3 


4540; 


fvalue 

n 

i 


4401; 


1 


h2  hl 

->  086 

-  i+1 

■  f(x) 
i 

-  2 
->  7 

-  t-1.0 

0 


ix-l 


,^4540 

,^9408 


,^4401 


,<  632 


next  case 


is  the  conjugation  phase 
completed? 


stop  conjugation  process 


e  «-0.2e 


632; 


l 


1 


1-1 


1  * 


xm(i)  <- 


0 

x 


increase  accuracy  req’d  in 
step-size  problem 

,^4540  is  iterative  cycle  complete? 
store  previous  x 
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s  «- xm(^-| ) -xm(i)  determine  direction  of  search 

s  «-  s/  |  s  I 

x  ■*-  xm  (i  ^ )  define  starting  point  cf  search 


i 

«- 1-1 

i 

:  3 

->  7 

fvalue 

.  ex 

END 

xm(l) 

<-  X 

take  last  result  as 
initial  value 

new 

h 

«-  0 . 2h 

define  new  initial 

step-size 

e 

5  •  Oe 

->  5408 

branch  to  restart 

END 

§3.46  A  Mixed  Method 

Davidon  (1959)  has  described  methods  for  locat¬ 
ing  a  minimum  of  a  quadratic  test  function,,  involving  the 
calculation  of  an  approximate  inverse  of  the  Hessian 
[a  T/8x. dx.]  without  requiring  that  the  Hessian  be  computed 

J 

explicitly. 

Starting  with  x°  as  an  initial  approximation  to 
a  and  H°  as  an  approximation  of  the  inverse  H-\  Davidon 
computed  the  gradient  VT(x°)  and  a  new  test  point  xpx°  = 
x°  -  H°VT(x°) .  If  T(x°)  <  T(x°)  or  if  there  is  a  minimum  of 
T  between  x°  and  x°,  the  approximate  inverse  is  modified  so 
that  the  smallest value  of  T  would  have  been  arrived  at  in 
one  step: 

x"^  =  x°  -  H^VT(x°)  . 


' 


. 
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The  difference  matrix  H^-H0  is  so  chosen  that 
the  step  x^-x°  is  multiplied  by  a  proper  scalar  o  that  all 
perpendicular  steps  would  be  unchanged.  The  process  is 
repeated  until  the  predicted  change  is  less  than  some 
specified  limit. 

The  method  of  modification  depends  upon  the 
following  results  from  matrix  algebra,  v  :  column  vector, 
its  transpose,  and  b  a  scalar,  then  the  matrix 

H1  =  H°  +  b(H°v) (H°v)T 

TTd 

v  H  v 

has  the  property  that 

H1v  =  (l+b)  H°v 

and,  if  y  is  any  vector  such  that 
(H°v)^v  =  0,  then 

TT1  tt° 

H  y  =  H  y  . 

Further,  if 


II 

o 

* 

£ 

1  H°  | 

,  and 

II 

1 — 1 

X 

£ 

1  H1  | 

,  then 

VTtx1)  = 

(l+b) 

VT (x°) 

Finally,  if  H°  is  singular  and  annuls  a  vector  Z,  then 
also  annuls  Z. 

Davidon  also  describes  a  method  which  is  equiva¬ 
lent  to  a  conjugate  gradient  method  when  T  is  a  quadratic 
form  in  the  independent  variables.  Starting  with  x°,  VT(x°), 
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and  H°,  we  find  by: 

x  =  x  -  H  VT(x  )  , 

x"*"  =  x°  -  VT(x°),  so  that 

x1  =  x°  -  H1  VT(x°)  . 

This  is  achieved  by  the  adjustment 

H1  =  H°  -  b(H°VT(x°) ) (H°VT(x°) )T 
where  b  =  1  /(VT(x°))T  h°(VT(x°)  -  VT(x°)) 
if  H°  is  symmetrical  and  if  b  can  be  computed.  A  more 
comprehensive  treatment  of  the  method  can  be  found  in 
Davidon  (1959) .  This  method  has  not  been  adequately  tested 
beyond  Davidon’ s  comment  that  it  appeared  to  be  faster  and 
more  effective  than  other  known  computing  methods. 

§  3.5  Direct  Search  Methods 


The  recent  trend  in  descent  methods  is  to  stress 
simplicity  in  coding  for  a  stored  program  computer;  in 
particular,  the  calculation  of  derivatives  tends  to  be  avoided. 
The  extremes  of  this  trend  occur  in  the  downhill  methods  of 
Ward  (1957)  and  Lance  (1959) *  in  which  the  zeros  of  test 
functions  are  sought  by  systematic  sampling  procedures.  The 
phrase  "direct  search"  is  used  to  describe  sequential  examina¬ 
tion  of  trial  solutions  involving  comparison  of  each  trial 
solution  with  the  "best"  obtained  up  to  that  time  together 
with  a  strategy  for  determining  what  the  next  trial  solution 
will  be.  The  strategy  used  in  selecting  a  new  point  is 
dictated  by  various  aspects  of  the  problem,  including  one's 
knowledge  of  the  solution  space,  the  rules  of  transition 
between  states,  and  the  rules  for  selecting  trial  points. 

§  3.51  Downhill  Method  of  Ward  and  Lance 


The  test  function  preferred  by  both  Ward  and 


Lance  is  the  absolute-value  norm  or  Gerschgorin  vector  norm 


n 

T(x)  =2  (f .  (x)|  ;  x  :  nxl  . 

1=1  1 
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By  Ward's  method,  values  of  the  test  function 
are  computed  at  an  arbitrary  starting  point  x°  and  at 
x°  ±  he^  ,  for  1=1,  2,  .  ..,  n,  where  h  is  an  arbitrary 

step-size  and  e^  the  unit  vector  whose  only  non-zero  component 
is  in  the  i-th  position.  If  each  independent  variable  is 
changed  by  +  h  provided  that 

T(x°  +  he^)  <  T(x°) 

and  otherwise  is  left  unchanged,  a  point  x^  is  reached  which 
satisfies 

T (x1)  <  T(x°). 

A  "pattern  move"  is  next  made  defining  a  point  x  =  x^  which 
consists  of  changing  all  independent  variables  by  an  amount 
equal  to  the  sum  of  the  changes  made  in  those  variables 
during  the  previous  n  moves  or  searches.  The  process  is  now 
repeated  with  the  previous  point  x^  as  the  starting  point. 

The  event 

T(xk)  >  T(xk) 

at  the  conclusion  of  the  k-th  iterative  cycle  results  in 

k  _ k 

x  being  ignored  and  replaced  by  x  .  Eventually  we  arrive 

_ kil 

at  a  point  x  which  satisfies 

T(xk+1)  =  T(xk)  ; 

at  this  time  we  replace  h  by  h/2  and  repeat  the  previous 
calculations.  This  process  of  searches  and  pattern  moves 
is  continued  until  the  s  tep-size  is  reduced  to  a  predeter¬ 
mined  minimum. 

Lance  (1959)  has  used  a  variant  of  this  procedure 
to  approximate  the  gradient  of  the  test  function  T(x)  from 
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the  relations 


c)T  £ 

■3x4 


-  T(x°)  -  T (x°+he . ) 


TT 


i'  ,  or 


JT  ,  1  ,  T(x°-he±)  -  T(x°+he^) 

J  — \  " 


"Sx. 


h 


)  * 


which  lead,  respectively,  to  the  following  approximations 


for  x  : 


x^  =  x?  +  T(x°)  -  T(x°+he^)  ,  or 

x^  =  x?  +  1  (T (x°-he . )  -  T(x°+he.)  )  , 

-1  -L  *2  1  ± 

i  =  1,  2,  .  .  .  ,  n. 

The  first  approach  is  preferred  because  fewer  values  of  the 
test  functions  are  required.  Using  the  Gerschgorin  vector 
norm  sometimes  leads  to  stalemate  situations  but  this 
difficulty  can  be  detected  and  remedied.  Ward's  method  de¬ 
generates  into  a  stalemate  situation  when  the  test  function 
contains  a  "trough"  which  is  inclined  to  the  axis.  This 
particular  difficulty  is  treated  effectively  by  Lance's 
variation.  An  example  of  a  simple  problem  where  Lance's 
method  failed  is  given  by 


In  general,  it  can  be  said  that,  Lance's  method  may  tend 
toward  a  stalemate  situation  when  the  test  function  has 
troughs  along  the  coordinate  axis. 


■ 
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To  conclude  this  discussion  of  downhill  methods 
the  following  properties  may  be  noted: 

(i)  The  descent  or  downhill  methods  seen  to  assure, 
in  general,  convergence  in  the  large,  but  the  rate  of  con¬ 
vergence  and  accuracy  may  be  poor  at  multiple  roots.  The 
development  of  stalemate  situations  may  prevent  ultimate 
convergence . 

(ii)  A  general  technique  can  be  devised  for  a  wide 
variety  of  problems  with  little  or  no  modification  from 
problem  to  problem. 

(iii)  One  important  fault  of  these  methods  is  that 
they  will  cause  convergence  to  a  point  of  indeterminacy  of 

a  system  of  equations,  and  generally  will  do  so  without 
giving  warning  of  any  kind  (cf.  the  example  above). 
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CHAPTER  IV 
LINEAR  PROGRAMMING 

§  4.1  Introduction 

The  mathematical  theory  of  Linear  Programming  is 
concerned  with  optimizing  a  linear  expression  subject  to 
linear  constraints. 

Historically  the  general  problem  of  linear  pro¬ 
gramming  was  first  developed  and  applied  in  19-4-7  by  Dantzig, 
Wood,  and  their  associates  of  the  U.S.  Department  of  the  Air 
Force.  At  that  time,  this  group  was  studying  the  feasibility 
of  applying  mathematics  and  related  techniques  to  military 
programming  and  planning  problems.  The  initial  mathematical 
statement  of  the  general  problem  of  linear  programming  was 
made  by  Dantzig  in  194-7;  he  also  devised  a  systematic  pro¬ 
cedure  for  solving  the  problem  known  as  the  Simplex  method. 
Prior  to  this  a  number  of  problems  (some  unsolved)  had  been 
posed  which  required  the  optimization  of  a  linear  function 
subject  to  linear  constraints.  The  more  important  examples 
include  the  transportation  problem  posed  initially  by 
Hitchcock  (194-1)  and  later  by  Koopmans  (194-9)  and  the  diet 
problem  of  Stigler  (194-5)  .  The  first  successful  application 
of  a  high-speed  electronic  computer  to  the  linear  program¬ 
ming  problem  occurred  in  January,  1952,  on  the  N.B.S.  SEAC 
machine . 

The  fundamental  literature  in  this  field  can  be 
found  in  the  published  papers  of  the  Cowles  Commission  for 
Research  in  Economics  resulting  from  a  conference  held  in 
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Chicago  in  June,  1949.  Most  of  the  conference  papers  are 
available  in  Koopmans  (1951) •  In  particular,  the  papers 
by  Dantzig  (1951a, b)  can  be  said  to  define  the  new 
discipline  of  Linear  Programming. 

An  important  reference  on  the  theory  of  linear 
inequalities  is  Kuhn  and  Tucker  (1956) .  This  reference 
includes  papers  which  present  a  detailed  exposition  of  the 
fundamental  mathematical  results  which  form  a  basis  for  the 
linear  programming  models :  papers  which  answer  purely  math¬ 
ematical  questions  that  have  appeared  in  the  elaboration  of 
the  economic  theory;  papers  which  include  explicit  results 
such  as  the  duality  theorem  of  linear  programming  for  in¬ 
dependent,  mathematical  purposes;  and  finally,  papers  which 
consider  problems  on  the  economic  application  of  the  models. 

Problems  arise  in  a  large  number  of  fields  to  which 
the  discipline  of  Linear  Programming  can  be  successfully 
applied  and  include,  among  others,  the  following:  industrial 
applications,  transportation  problems,  contract  awards, 
military  applications,  marketing  analysis,  and  production 
scheduling.  The  methods  now  available  in  Linear  Programming 
are  many  and  varied  but  for  the  purposes  of  this  thesis  we 
review  only  the  basic  Simplex  and  Dual  Simplex  methods. 

In  this  chapter  is  also  included  a  definition  of  the  linear 
programming  problem,  the  fundamental  theorems  involved,  and 
certain  definitions  required. 

The  references  consulted  in  this  review  include 


Gass  (1958),  Kuhn  and  Tucker  (1956),  and  Vajda  (1961).  The 
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The  first  reference  Gass  (1958)  Is  an  excellent  Introductory- 
text  to  the  new  discipline  of  Linear  Programming. 

§  4.2  Definition  of  the  Problem 

Linear  Programming  is  concerned  with  optimizing  a 
linear  expression  -  the  objective  function  -  subject  to 
linear  constraints. 

*  ip 

e.g.  maximize  p  x  =  z  ;  p, x  :  nxl  , 

subject  to  Ax  <  b  ^  x  >  0  ;  A  :  mxn  ;  b  :  mxl . 

The  constraints  can  be  rewritten  as 

Ax  +  Imy=  0  ;  Im:  mxm  ;  y  :  mxl  , 
where  the  y^ ' s  are  called  "slack  variables". 

The  following  compact  notation  is  also  used  to 
define  the  problem. 

maximize  j  p^x  I  Ax  <  b,  x  >  oj  ,  or 

maximize  j p^x  I  Ax  +  Imy  =  b  ,  x,y  >  0 
A  :  mxn,  I  :  mxm  ;  p,  x  :  nxl  j  y,  b  :  mxl  . 

§ _ I . j  Definitions  and  Theorems 

§  4.31  Definitions 

§  4.31.01  A  feasible  solution  is  a  vector  x  :  (n+m)  x  1 
which  satisfies  the  equations  Bx  =  b,  where  B  =  [A,  Im  ]  ; 

*  We  use  the  work  maximize  here  and  in  subsequent  work  as 
standing  for  1.  u.  b.  or  limes  superior  of  f(x)  under  the 
condition  x  eR.  In  this  particular  example  we  have  f(x)  = 

p  x  and  R  is  defined  by  Ax  <  b  ,  x  >  0. 


' 
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A  :  mxn  ,  I  :  mxm  ,  and  the  constraints  x  >  0  . 

§  4.31.02  A  basic  feasible  solution  has  no  more  than  m 
positive  Xj  . 

§  4.31.03  A  non-degenerate  basic  feasible  solution  has 
exactly  m  positive  Xj  . 

§  4.31.0^  An  optimum  feasible  solution  optimizes  the 
objective  function  subject  to  the  constraints. 

§  4.31.03  If  U2  are  ^wo  points  in  n-space  then  a  convex 
combination  of  u-^  u^  is  a  point  u  such  that 

u  =  u-^  +  u^  j  where  0  <  ou  <  1  and  +  aQ  =  1. 

A  set  K  of  points  is  convex  only  if  u^  eK  and  u^  e  K  implies 
that  a  convex  combination  of  u^,  u^  denoted  by  u  e  K. 

§  4.31.06  A  point  u^  e  K  is  an  extreme  point  (vertex)  of 
K  if  u j  cannot  be  expressed  as  a  convex  combination  of  any 
two  other  points  of  K. 

§  4.31.07  The  convex  hull  C(K)  of  any  given  set  of  points 
K  is  the  set  of  all  convex  combinations  of  sets  of  points 
from  K.  C (K)  is  the  smallest  convex  set  containing  K  ; 
e.g.  C(K)  where  K  consists  of  a  finite  number  of  points  is 
a  convex  polyhedron. 

§  4.31.08  Two  linear  programming  problems  are  called  dual 
if  they  are  associated  in  the  following  way  : 

\  T 

maximize  <{p  x  =  z  |  Ax<b^x>0 
bTy  =  w  |  ATy  >  p  ,  y  >  0 


minimize 


0  f:  .  .  a W  t  •  ,  ■  •.  r  .  •  •  i 

: 

. " — . —  -  • — - — 

'.tw  •:  Jr :2c isrj  jc  q  /,;; 

-3  to 


96 


Duality  is  a  symmetrical  relation  and  we  refer  to  one  problem 
as  the  primal  and  to  the  other  as  the  dual. 

§  4.31.09  A  starting  point  of  a  more  detailed  investiga¬ 
tion  ca  0 erning  linear  inequalities  is  the  dual  homogeneous 
system 

Ax  >  0  j  x  >  0  ;  A  :  mxn  ;  x  :  nxl  ,  and 
T 

Au<0,u>0;u:  mxl  . 

§  4.32  Theorems 
§  4.32.01  The  Farkas  Theorem 

If  for  all  vectors  x  satisfying  a  system  of 

T 

homogeneous  linear  inequalities  Ax  >  0  we  have  that  p  x  >  0, 

then  p  is  a  non-negative  linear  combination  of  the  rows  of 

T 

the  matrix  A:p  =  Au.,u>0;A:  mxn  ;  x  ,  p  :  nxl,  u  :  mxl. 

Geometrically,  if  all  the  points  which  are  on  those 

sides  of  all  hyperplanes  1L  :  a^x  =  0  (i=l,...,m),  a^  the 

T 

i-th  row  of  A,  lie  also  on  that  side  of  p  x  =  0  then  p  lies 
within  the  cone  spanned  by  the  a^.  Farkas  (1902),  Zoutendijk 
(i960),  Vajda  (1961),  and  Kuhn  and  Tucker  (1956). 

Proof ;  Let  C  =  j  y  |£JU  u  >  0  ,  y  =  A^u  |  be  a  convex  poly¬ 
hedral  cone  in  n-space.  We  shall  show  that  if  p  /  C  then 

T 

dx  such  that  Ax  >  0  and  p  x  <  0.  This  will  prove  the 
theorem.  Hence  assume  p  /  C.  Let  q  :  nxl  be  the  projection 
of  p  on  C  and  r  =  q-p,  r  :  nxl.  It  follows: 

(i)  for  all  y  e  C  ,  (q-p)T(y-q)  >  0  ,  since  <  0 
implies  that  p  e  C  , 

(ii)  r  q  =  0  ,  since  ^  0  implies  that  p  e  C  . 
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m 

From  (:L)  and  (.11)  wo  obtain  r  y  >  0  for  all 

y  e  C.  It  follows  that  for  the  point  corresponding  to 

T 

in  n-space  we  have  ai  g  C  ;  hence  a^r  >  0  ,  so  that  A  r  >  0 

m  m  rp  m 

holds.  Since  moreover  pr=qr-rr=-rr<0  we  see 
that  r  is  the  vector  looked  for. 

§  4.32.02  Key  Theorem 

The  two  dual  homogeneous  systems  Ax  >  0  and 
T 

A  u  =  0,  u  >  0  ;  A  :  mxn  ;  x  :  nxl - ,  u  :  mxl  ,  always  have 
a  solution  x°,  u°,  satisfying  Ax°  +  u°  >  0  and  u?  a.  x°  =  0 
for  all  i,  where  a^  is  the  i-th  row  of  A. 

The  essence  of  this  theorem  is  that  the  convex 
hull  of  finitely  many  halflines  is  the  intersection  of 
finitely  many  halfspaces,  Kuhn  and  Tucker  (1956),  Good 
(1959),  Vajda  (1961),  and  Zoutendijk  (i960). 

Proof ;  If  a^x  <  0  for  all  x  satisfying  Ax  >  0  then  by 

T  T  T  i 

Farkas 1  Theorem  -  a.  =  A  u  for  some  u  >  0,  so  that  A  u  =0 

i  i 

would  hold  for  some  vector  u  >0  with  u^  >  0.  Hence  either 
a^x  >  0  for  some  x  satisfying  Ax  >  0,  or  A  u  =0  for 

some  u1  >  0,  u^  >  0.  Let  u^  =  0  in  the  former  case  and  x^  =  0 

—  1 

in  the  latter  so  that  a.  x1  +  u'l'  >  0  will  hold  for  all  i. 

1  1 

o  m  i  o  ^  i 

Let  x  =  2  x  ,  and  u  =  2  u  ,  then: 

i=l  1=1 

(i)  Ax°  >  0  follows  from  Ax1  >  0  for  all  i, 

(ii)  A^u°  =  0  follows  from  A^u^  =  0  for  all  i, 

(iii)  u°  >  0  follows  from  u1  >  0  for  all  i, 

(iv)  Ax°  +  u°  >  0  follows  from  a^x^  +  u^  >  0  for  all 
k  and  >  0  for  k  =  i, 

(v)  u?  a.x°  =  0  follows  from  u.ha.x°  =  0  for  all  h. 

V  7  1  1  11 
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This  proves  the  theorem. 

§  4.32.03  Inconsistency  Theorem 

The  system  of  linear  inequalities  Ax  <  0  is 

T 

inconsistent  if,  and  only  if,  A  u  =0  for  some  u  >  0  and 
u±  >  0  for  at  least  one  i,  hence  if,  and  only  if,  one  of 
the  rows  of  the  matrix  A  is  linearly  dependent  with  non¬ 
positive  coefficients  on  the  other  rows  of  A. 

§  4 >32.04  Theorem 

The  set  of  all  feasible  solutions  to  a  linear 
programming  problem  constitute  a  convex  set  whose  extreme 
points  correspond  to  basic  feasible  solutions. 

§  4.32.09  Theorem 

The  objective  function  assumes  its  optimum  value 
at  an  extreme  point  of  R. 

§  4.32.06  Theorem 

Every  extreme  point  has  m  linearly  independent 
vectors  associated  with  it. 

§  3.32.07  Theorem 

There  are  at  least  (^)  sets  of  linearly  indepen¬ 
dent  vectors  for  the  case  Ax  =  b,  x  >  0  ;  A  :  mxn  ;  x  :  nxl  , 
b  :  mxl  . 

§  4.32.08  The  Duality  Theorem 

If  either  the  primal  or  the  dual  problem  has  a 
finite  optimum  solution,  then  the  other  problem  has  a  finite 
optimum  solution  and  the  extremes  of  the  linear  function 
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are  equal,  I.e.,  min  w  =  max  z  (cf.  §  4.31.08)  . 

If  either  problem  has  an  unbounded  optimum 
solution,  then  the  other  problem  has  no  feasible  solutions. 

§4.4  The  Simplex  Method 

The  Simplex  method  of  Dantzig  is  the  fundamental 
method  in  the  discipline  of  Linear  Programming.  By  the 
Simplex  procedure,  we  can,  once  any  basic  (extreme-point) 
feasible  solution  has  been  found,  obtain  an  optimum 
feasible  solution  in  a  finite  number  of  steps.  The  approach 
consists  of  obtaining  new  feasible  solutions  where  our 
choice  is  such  that  the  value  of  the  objective  function 
never  decreases  in  the  maximization  and  never  increases  in 
the  minimization  problems.  Since  only  a  finite  number  of 
extreme  point  solutions  exist  the  optimum  value  of  the 
objective  function  will  eventually  be  found.  The  numerical 
computations  performed  in  using  the  Simplex  method  are  simple 
and  straightforward.  The  degenerate  case  -  the  case  when 
too  many  hyperplanes  pass  through  an  extreme  point  -  will 
lead  to  cycling  in  the  method  but  this  problem  can  be 
easily  dealt  with. 

§  4.41  Formulation  of  the  Simplex  Method 


We  now  proceed  to  define  the  Simplex  Method  by 
means  of  an  example. 


maximize 


maximize 


pTx  |  Ax  <  b  ,  x  >  0  >  ,  or 


T  ,  ( 

p  x  |  Ax  +  I  y  =  b,  x,  y  >  0>  ,  where 


-1 


-2  1 


10  0 


Ml 


p  =  (  x)  ;  A  =  (  1  -2)  ;  Im  =  (0  1  0)  ;  y  =  (y2)  ;  b  =  (2)  . 


m 


11  0  0  1  y^ 

The  above  problem  can  be  represented  more 
economically  by  the  following  tableau 


X1 

X2 

b 

-2 

1 

2 

1 

-2 

2 

1 

1 

5 

1 

-1 

yi 

Yo 


Y' 


. .  ...  ..  . . .  .  .  ...  . 
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The  tableau  is  interpreted  as: 


-2Xi 

+  *2 

+ 

1 — 1 

=  2 

1 — 1 

x 

-2x2 

+ 

OJ 

=  2 

1 — 1 

X 

+  x2 

+ 

y3 

=  5  .  and 

Xn 

-  x0 

+ 

Z 

=  z 

1 

2 

0 

Our  initial  basic  feasible  solution 

0 
0 

=  (  2  )  gives  an  objective  function  value 

2 

5 

for  zQ  =  0.  Clearly,  our  basis  at  this  point  consists  of 
(y2.y2.y3)  • 

The  Simplex  method  allows  us  to  determine  the 
coordinates  of  a  new  extreme  point  by  an  interchange  of 
variables  which  is  accomplished  as  follows: 

(i)  We  choose  a  pivot  in  A  as  follows: 

(a)  the  pivotal  column  is  first  determined  by 
the  algebraically  smallest  element  in  the 
objective  function,  i.e.  in  the  bottom  double 
lined  row  (in  our  example  the  pivotal  column 
is  determined  by  x^  =  -1)  ; 

(b)  the  pivotal  row  is  next  determined  by  the 

algebraically  largest  ratio  of  a.  ,  .  /, 

i=l,2, (in  our  example  i  =  1  since 

1/2  >  1/5  >  -2/2)  . 


(  y1  ) 

y2 

y3 
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At  this  point  our  pivot  or  pivotal  element  Is  a^0. 
(il)  Divide  all  quantities  in  the  pivotal  row  by 
pivotal  element. 

(ill)  Divide  all  quantitites  in  the  pivotal 
column  by  the  negative  of  the  pivotal  value  and  replace  the 
pivot  by  l/pivot. 

(iv)  Replace  all  other  quantities  a^j 

(i^  piv.row.,  j^  piv.col.)  by  a, .  +  (a  .  .)  x  (a.  .  n  \ 

v  r  *  >  or  ^  '  *  ij  v  piv.row. v  i, piv.col.) 

/ ^apiv . row . j  piv . col . ^  * 

(v)  Interchange  the  variables  defined  by  the 
pivot  (in  our  example  we  interchange  x^  and  y^). 

(vi)  Repeat  from  (i)  until  all  coefficients  in 
the  bottom  double  lined  objective  function  row  are  positive. 

(N.B,  In  the  following  example  we  have  incorporated  a  slight 
variation  of  the  above  procedure.  The  variation  is  that  we 
multiplied  all  quantities  in  the  pivotal  row  by  the  pivot  at 
the  end  of  (ii)  above.) 

e.g.  Interchange  1. 


X1  x2 

b 

xi  yi 

b 

i 

ro 

|M 

2 

-2  1 

2 

1  -2 

2 

Yn  (ii)  ,  (iii)-^ 

3  2 

6 

1  1 

5 

(iv)  &  (v) 

y^ 

i — i 

i 

ro 

3 

1  -1 

z 

-1  1 

2 

lx 


2 


y 


2 


Y' 


i(i)a 
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Interchange  2. 


1 — 1 

1 — 1 

X 

b 

y3  yl 

b 

-2  1 

2 

OJ 

X 

1 — 1 

2/3  1/3 

4 

-3  2 

6 

v  (ii)  .  (ill) 

3  2  - 

1  1 

9 

i — 1 

1 

m| 

3 

y3— ll)b  (lv)  &  (v) 

1  -1 

3 

-1  1 

2 

Z 

1/3  2/3 

3 

|i) a 


The  progress  in  basic  feasible  solution  and 
objective  value  is  given  by  the  sequence 


x-^=0 

o 

II 

1 — 1 
>< 

y3=o 

x2=0 

yi=° 

yi=° 

yi"2  )  , 
y2=2 

xq=2 

(  2  )  . 
y2=6 

(  X2=4 
y2=9 

y3=5 

y3=3 

3x1=3 

N 

O 

II 

O 

zo=2 

zo=3 

Our  final  tableau  is  now  interpreted  as: 


2y3 

+ 

y-, 

+ 

3x2  . 

=  12 

y3 

+ 

i — 1 

+ 

II 

CM 

=  9 

y3 

- 

^i 

+ 

3X1  = 

=  3  , 

y3 

+ 

i — i 

+ 

3z  = 

=  9  . 

Clearly,,  we  have  arrived  at  the  optimal  solution  since  y^ 
and  y^>  cannot  be  decreased  and  still  remain  positive;  hence, 
no  interchange  of  variable  can  increase  the  value  of  the 
objective  function.  This  particular  example  can  be  solved 
graphically  without  difficulty.  The  importance  of  the 
Simplex  method  becomes  evident,  however,  in  a  problem  which 
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incorporates  a  large  number  of  constraints  in  a  large  number 
of  variables. 

A  more  comprehensive  treatment  of  the  Simplex 
method  is  available  in  Vajda  (1961)  and  Gass  (1958). 


§  4.42  The  Dual  Simplex  Method 


From  the  definition  of  the  dual  and  the  duality- 
theorem  we  observe  that  to  each  tableau  of  the  Simplex  method 
there  corresponds  a  dual  tableau,  and  to  an  optimal  tableau 
of  the  primal  problem  there  corresponds  an  optimal  tableau 
of  the  dual  problem.  To  fix  our  ideas,  assume  we  have  the 
maximization  problem  defined  by: 

\  T 

maximize  s  p  x  |  Ax  <  b  ,  x>0 
described  by  the  tableau 


minimize 


1 — I 

X 

X2 

*  *  '  xn 

b 

1 — 1 

1 — 1 
cti 

al2 

• • *  aln 

1 — 1 

a21 

a22 

b2 

aml 

amn 

b 

m 

lEj — 

-P  2 

•■••’Pn 

yl 

y2 


y 


m 


z 


From  the  dual  defined  by 

mm  1 

b  u  |  A  u  >  p  ,  u  >  0  S.  and  depicted  by  the  tableau 


U]_  *  * 

-u 

m 

P 

1 — 1 

1 — 1 

cti 

a  n 
ml 

"Pi 

"P2 

a  n 
nl 

anm 

~Pn 

.  Vv" 

.  b 

ID 

■ 


r  m  1  l0T:q  I.s  .  s  . 


•  ’ 


. 
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we  can  clearly  represent  both  the  primal  and  the  dual  by 
the  following  tableau 


l — 1 

* 

X2 

Xn 

b 

1 — 1 

3 

1 

1 — 1 

£5 

U2 

A 

:  mxn 

-u 

m 

b 

m 

P 

'Pi 

1 

ro 

••-pn 

1 — 1 
> 

v- 

.  .  v 

n 

w 

We  observe  that  in  the  dual  problem  we  can  assume 
that  the  quantitites  in  the  double-lined  row  all  have  a 
positive  sign  but  we  don't  insist  upon  feasibility  of  the 
variables  (though  still  restricting  ourselves  to  basic  solu¬ 
tions)  .  Whenever  we  have  a  negative  basic  variable,  we 
transform  the  tableau.  Since  the  primal  and  its  dual  can  both 
be  depicted  by  the  same  tableau  we  note  that  we  can  solve  the 
dual  problem , assuming  a  solution  exists,  by  solving  the 
primal  problem.  We  observe  that,  up  to  the  final  tableau, 
we  are  dealing  in  the  dual  problem  with  solutions  which  are 
not  feasible.  We  have  overshot  the  target  and  we  must  trace 
our  steps  back  to  a  feasible  solution  which  is  then  optimal. 

4.3  The  Transportation  Problem 

The  transportation  problem  -  posed  initially  by 
Hitchcock  (1941)  and  later  independently  by  Koopmans  (1947)  - 
can  be  described,  in  its  simplest  form,  as  follows: 

A  homogeneous  product  is  to  be  shipped  in  the 
amounts  a2,  .  am,  respectively,  from  each  of  m  shipping 


. 
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origins  and  received  in  amounts  b-^,  b0,  .  ..,  bn,  respectively, 
by  each  of  n  shipping  destinations.  The  cost  of  shipping  a 
unit  amount  from  the  i-th  origin  to  the  J-th  destination  is 
c^j  and  is  known  for  all  combinations  (l,j).  The  problem  is 
to  determine  the  amounts  x^j  shipped  over  routes  (i,j)  so  as 
to  minimize  the  total  cost  of  transportation. 

Clearly,  we  can  assume  that  the  total  amount 
shipped  is  equal  to  the  total  amount  received;  i.e., 

2  a .  =  2  b  . 

<  1  n  0 


Furthermore,  since  a  negative  shipment  is  not  meaningful  we 

restrict  each  x.  .  >  0. 

J 

The  mathematical  formulation  of  the  transporation 
problem  is  given  by: 


minimize 

subject  to 


and 


Z  Z  c .  .  x.  . 
i  j  1J  ^ 

the  constraints 


n 

2 

X.  . 

=  a. 

i  =  1, 

i — I 

II 

*r~D 

1J 

i 

m 

2 

X.  . 

=  b  . 

j  =  1, 

i — 1 

II 

•H 

1J 

J 

m 

n 

2 

a. 

=  2  bj 

i — 1 

li 

•H 

_L 

i — 1 

II 

.  .  ,  m 


.  .  ,  n 


x.  ,  >  0 
ij  - 


all  i,  j 


A  more  formal  definition  of  the  transportation  problem  using 
the  previous  notation  is  formulated  as: 


mo  ■ 
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minimize  s  Z  2  c.  .  x. . 

I  i  j  1J  1J 


X=a,  XT=b,  2a^=2b  j  ;  x^ j ,  a,  b  >  o(  ; 

1  J  \ 


C  ,  X  :  mxn  ;  a  :  nxl  ,  b  :  mxl  . 

Another  formulation  of  the  transportation  problem 
is  given  by: 


minimize  j  c  X  |  AX  =  PqJ  X  >  (|  ; 

X,  c  :  mnxl  ,  PQ  :  (m+n)xl  ,  A  :  (m+n)xmn  . 

For  m  =  2  and  n  =  3  .>  we  obtain  the  following  5  (i.e.,  min) 
equations  in  6(i.e.,  mn)  unknowns: 

=  a- 


ii  +  xi2  +  xi3 


i 


x2  2^  "t  x22  x23  a2 


x 


11 


+  x 


21 


x 


12 


+  x 


22 


=  b- 


=  b. 


x 


13 


+  x2^  =  b 


Clearly,  the  first  equation  is  redundant  and  does  not  need  to 
be  included  in  the  system.  Generalizing,,  we  note  that  one 
equation  can  always  be  eliminated,  and  the  transportation 
problem  reduces  to  m+n-1  independent  variables  in  mn  equations. 

We  observe  that  the  transportation  problem  involves 
generally  a  sparse  matrix  in  its  formulation.  Hence,  the 
various  techniques  now  available  take  advantage  of  the 
properties  associated  with  sparse  matrices.  A  more  extensive 
treatment  of  the  transportation  problem  is  available  in  Gass 
(1958),  Vajda  (1961),  etc. 
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CHAPTER  V 

NONLINEAR  PROGRAMMING 

§  5.1  Introduction  and  Definition  of  the  Problem 
§  5.11  Introduction 

The  linear  programming  model  is  too  restricted 
an  approach  for  many  mathematical  programming  problems.  The 
nonlinear  programming  problem  requires  that  a  nonlinear 
function  be  optimized  subject  either  to  linear  or  nonlinear 
constraints.  A  capital  distinction  between  the  two  types 
of  problem  may  be  emphasized  here:  the  optimum  solution  of 
the  linear  programming  problem  lies  at  a  vertex  of  the  region 
determined  by  the  linear  constraints,,  i.e.  if  the  solution 
exists;  in  the  nonlinear  problem  the  optimum  solution  need 
not  be  on  the  boundary. 

A  special  case  of  nonlinear  programming  defined 
as  quadratic  programming  -  the  problem  of  minimizing  a  convex 
quadratic  function  (or  its  dual).,  subject  to  linear  constraints 
-  is  studied  at  some  length  in  this  thesis.  A  number  of 
methods  are  now  available  which  range  from  a  direct  exten¬ 
sion  of  the  Simplex  method  to  the  more  sophisticated  "Method 
of  Feasible  Directions"  and  "Gradient  Projection"  methods. 

The  "Gradient  Projection"  by  Rosen  (i960)  can  be  described  by 
a  simple  computing  algorithm  and  solved  efficiently  on  an 
electronic  computer. 

We  do  not  discuss  the  more  difficult  nonlinear 
programming  problem  defined  as  convex  programming  -  the 
problem  of  minimizing  a  convex  function  (or  maximizing  a 
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concave  function)  in  a  convex  region.  For  other  techniques, 
an  extensive  bibliography  on  linear  and  nonlinear  programming 
has  been  compiled  by  Riley  and  Gass  (1958) . 

§  5.12  Definition  of  the  Nonlinear  Programming  Problem 

§  5-12.01  The  nonlinear  programming  problem  requires  e.g. 
that  a  convex  function  f(x)  be  minimized  subject  to  the 
constraints  h(x)  >0,  x  >  0  ;  h(x)  :  mxl  ,  x  :  nxl  ,  where, 
in  the  general  case,  both  f(x)  and  h(x)  are  nonlinear.  This 
problem  is  formulated  more  compactly  as: 

Minimize  jf(x)  |  h(x)  >  0  ,  x  >  0  |  ;  h(x)  :  mxl  ,  x  :  nxl  . 

We  shall  assume  that  f(x)  is  differentiable  with  continuous 
partial  derivatives. 

§  5.12,01  A  typical  quadratic  programming  problem  may  now 
be  defined  as: 

\  t  t  i 

maximize  <p  x  -  1  x  Cx  |  Ax  >  b  ,  x>0 

i  2 

C  :  nxn  symmetric,  non-negative  definite,  A  :  mxn  ;  p  , 
x  :  nxl  ,  b  :  mxl  .  The  constraints  Ax  >  b  are  also  denoted 

by 

a.x>b.  ,  i  =  1 ,  2,  ...,  m:  a.  :  Ixn  . 

l  —  i  l 

§  5.2  Definitions  and  Theorems 
§  5.21  Definitions  and  Notation 

§  5.21.01  A  point  xb  e  R  is  called  a  feasible  point  where  R 


is  the  closed  convex  region  which  consists  of  all  points  for 
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V\  Vn 

which  e(x)  >  0  ,  where  e1(xu)  =  a^x  -  ;  x  :  nxl  ,  e  :  mxl 

a^  :  lxn  . 

T 

§  9-21.02  A  useful  normalization  is  n^  =  (k  a)^  where  k 

T 

is  defined  by  n^  n^  =  1,  i=l,  2,  . . . ,  m.  The  constraints 

T  1 

e  (x)  =  Ax  -  b  >  0  can  then  be  written  e(x)  =  Nm  x  -  b  >0 
where 


N  —  [  n-,  ,nn,  .  .  .  ,n  ]  :  nxm  . 
m  —  L  1*  2  J  m 

§  5.21.03  The  (n-l)  -  dimensional  manifold  defined  by 
e^(x)  =  0  is  a  hyperplane  which  is  denoted  by  PL,  i.e. 
H±  :  e±(x)  =  n±Tx  -  b^ 
points  "into"  R. 


0 


We  use  the  convention  that  n^ 


§  3.21.04  The  unit  vector  s  :  nxl  with  initial  point  xb  e  R 
is  termed  a  feasible  direction  if  3  A ^  >  0  such  that  for  all 
A  ,  0  <  A  <  A^  ,  x  =  x^+As  e  R  . 

§  9.21.09  The  nonlinear  programming  problem  has  a  constrained 
global  maximum  if  g(x°)  =  Vf(x°)  ^  0  for  x°  e  R  .  The  non¬ 
linear  programming  problem  has  an  interior  global  maximum  if 
g(x°)  =  0  for  x°  e  R  . 


§  9.21.06  The  intersection  of  q  linearly  independent  hyper¬ 
planes  in  n-space  is  an  (n-q)  -  subspace  which  is  defined  as: 


Q  :  ei(x)  =  a±x  -  b±  ,  i  =  1,  2,  . q  . 

The  remaining  space  is  denoted  by  Q  and  since  QAQ  =  0  then 

_  m 

for  V  e  Q  ,  w  e  Q  (v,w:nxl)  we  have  v  w  =  0  . 


§  9-21.07  A  matrix  Pq  -  Nq  (NqTNq)  1  NqT  is  a  projection 

matrix  which  takes  any  vector  in  n-space  into  Q  ;  "P_  :  nxn  , 

Q 

symmetric.  A  matrix  =  I  -  PA  is  a  projection  matrix  which 

q  q 


takes  any  vector  in  n-space  into  the  intersection  Q  . 


§  5.22  Theorems  and  Lemmas 

The  following  Theorems  and  Lemmas  are  considered 
in  greater  detail  in  Rosen  (i960). 

§  9.22.01  Lemma 

Let  x°  £  Q  .  A  necessary  and  sufficient  condition 
b  T 

that  the  point  x  =  x  +  s  be  in  R  is  that  N^s  >  0. 

Proof :  Assume  that  for  some  i  ,  i=l,  2,  ...,  q,  say  i=l  , 

T 

n-^s  <  0.  Then  since 

/  b\  T  b  ,  ~  , 

e^(x  )  =  n-^x  -  b-^  =  0  ,  we  have 

/  \  T  /  b .  \  •>  T  .  ~ 

e-j^(x)  =  n^(x  +s)  -  b-^  =  n-^s  <  0  . 

Thus  x  violates  the  constaint  e^(x)  >  0  ,  and  is  therefore 

not  in  R.  This  proves  the  theorem. 

§  5.22.02  Theorem  I 

Let  n^  ,  1-1,  2,  q,  be  a  set  of  q  linearly 

independent  vectors,,  and  the  projection  matrix.  A  necessary 
and  sufficient  condition  that  a  nonzero  vector  y  be  linearly 
independent  of  the  set  n-j_,  i=l,  2,  q,  is  that  P^y  ^  0  . 

§  9.22.03  Theorem  II 

Let  x^  be  a  boundary  point  of  R  which  lies  on 
exactly  q,  1  <  q  <  n  ,  hyperplanes ,  which  are  assumed  to  be 
linearly  independent.  Let  the  intersection  of  these  hyper¬ 
planes  be  the  manifold  Q,  .  Then  the  point  x^  is  a  constrain¬ 
ed  global  maximum  of  f(x)  ,  if  and  only  if  P^  g(x^)  =  0  ,  and 


Ill 


(Nq  Nq) "1  Nq  g(xb)  <  0  . 

§  3.22.03  Quadratic  Dual  Theorem 

Let  the  primal  and  dual  quadratic  programming 
problems  be  given  as  : 

minimize  j  f (x)  =  pTx  +  lxTCx  |  Ax  >  b  ,  x  >  o|  ; 

C:nxn  symmetric,  non-negative  definite,  A:mxn  ;  p,x:nxl, 
b : mxl  ,  and 

im  T  T 

g(u,y)  =  b  y  -  _lu  Cu|Ay-Cu<p,  y>0 

2 

u:nxl  ,  y:mxl  .  Then  the  optimum  values  of  f(x)  and  g(u,y) 
are  equal. 

The  proof  of  this  theorem  is  available  in  Dorn 
(1958)  and  Vajda  (1961) . 

§  9-3  Quadratic  Programming 


Let  us  now  consider  the  special  case  of  nonlinear 
programming  known  as  quadratic  programming:  the  problem  of 
minimizing  a  convex  quadratic  function  (or  its  dual),  subject 
to  linear  constraints.  Three  methods  applicable  to  this 
problem  are  considered  in  this  chapter;  they  are 

(i)  convex  programming  by  extension  of  the  Simplex 

method, 

(ii)  method  of  Feasible  Directions, 

(iii)  Gradient  Projection  method. 

The  last  two  methods  can  be  extended  to  the  more  general 
problem  but  the  procedures,  in  general,  are  then  no  longer 
finite . 

§  9.31  Convex  Quadratic  Programming  by  Extension  of  Simplex 


Method 


Let  us  define  the  quadratic  programming  problem 


-  ••  . . .  -w«  >  ■  .  •  •  •  •  - 

:  lb  :.3  o cl  am.-X'  - .  .1 

:  ■  1.0.  '  <  ■:  x.xxj  :  ; 

■ 

■  m.  j.  ‘  :o  . ■  neriT  .  Ixm:  v  .  f.  m 

.  .  .oe  T  ■  :  :v  "•  0  .  0  xq  ...  'X 


•  £•  •  'Z,s&  x  x  ;■> 


s  I  . c  d-  :e.bi.  ••'•D  won  sjj  :  . 

tlx-  l:  x  ,o  ...  ?  i  js^bx  xp  xovnco  jS  g.  ...  x_.  X^Luha 

.  ••d.n  i  -Jid:  0  •  xertll  od 
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as : 

min.  jf(x)  |  Ax  >  b,x  >  oj  ;  where  f(x)=p^x  +  x^Cx  ; 

C  :  nxn  symmetric,  non-negative  definite,  A  :  mxn  ;  p  ,  x  : 
nxl  ,  b  :  mxl  .  A  simple  example  (written  out  in  full)  is 
given  by: 


The  gradient  of  the  quadratic  objective  function 
is  given  in  the  general  case  by 

g(x)  =  Vf(x)  =  p  +  Cx 
and  in  our  example  as 


/  \  /  — 44\  .  /  16  -12\ /  1\ 

s^x)  _  ^ _42  +  -12  34^x2^  * 

Our  approach  is  to  begin  with  an  initial  feasible  solution 

(e.g.  xi  =  x2  =  ^  an<^  ^ en  increase  some  nonbasic  variable 

x^  if  8f/8x^  <  0  .  In  our  example  we  choose  to  increase  x-^ 

since  Of/dx-^  =  -44  <  0  .  An  increase  in  the  chosen  nonbasic 

variable  is  useful  until  either  Of/dx^  or  one  of  the  basic 

variables  becomes  zero.  To  fix  our  ideas  we  illustrate 

the  example  by  the  following  tableau: 


X1 

X2 

b 

-8 

6 

-22 

2 

1 

10 

8 

-6 

-22 

-  6 

17 

-21 

-22 

-21 

0 

X3 

(x]_) 

(x2) 

(1) 
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From  the  above  tableau  we  observe  that  the  objective  function 

can  also  be  written  as 

xn  8xn-6x0-22 

f^x)  =  J,_^1^x0^  +  ^x1,x2^-6x1  17x2"2l)  ' 

The  Initial  feasible  point  at  x-^  =  x2  =  0,  u^  =  -22,  xQ  =  10 
gives  an  objective  function  value  of  f(x)  =  0  (lower  right 
hand  corner  of  tableau)  .  The  initial  basis  consists  of  u^ 
and  Xo  .  The  constraints  at  the  starting  point  are  given  in 
the  upper  part  of  the  tableau,  i.e. 

u1  =  -22  +  8x-^  -  6x2  and 
x^  =  10  -  2x^  -  x^ 

We  note  that  an  increase  in  the  nonbasic  variable  x-^  is 
useful  until  either  u^  or  x^  becomes  zero.  Since  we  keep 
x^  fixed  at  zero,  the  first  equation  holds  for  a  smaller 
x-^  .  We  now  interchange  variables  x-^  and  u^  ;  i.e.  the 
basic  variables  become  x^  =  2.75  j  x^  =  4.5  and  the  non¬ 
basic  variables  are  x2  =  u^  =  0  .  This  interchange  of 
variables  is  accomplished  in  two  steps;  the  first  step  is 
equivalent  to  the  interchange  of  variables  in  the  Simplex 
method;  the  second  step  is  equivalent  to  the  exchange  of 
variables  outside  the  brackets  and  does  not  present  any 
difficulties.  The  new  variable  u^  can  take  positive,  zero, 
or  negative  values;  hence,  we  call  it  a  free  variable. 

We  become  interested  in  such  a  free  variable  u.  when 

i 

Sf/8u^  /  0  .  We  introduce  a  new  free  variable  u  .  =  Sf/Su^ 

and  then  proceed  to  make  this  new  free  variable  nonbasic 

by  interchanging  the  free  variables  u.  and  u.  .  Once  the 

J 
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hitherto  nonbasic  free  variable  u  has  been  made  basic 
again,  it  can  be  ignored,  because  it  does  not  matter  If  it 
becomes  negative  through  a  change  In  the  value  of  another 
variable,  nor  are  we  concerned  with  Its  value  in  the  final 
solution. 

We  have  not  yet  proved  that  our  procedure 
terminates  in  a  finite  number  of  steps.  The  procedure  is 
finite  if  a  rule  is  followed  concerning  the  choice  of  non¬ 
basic  variables  to  be  made  basic.  This  rule  demands  that 
if  there  exist  any  nonbasic  free  variables  u^  ,  such  that 
df/du^  /  0  ,  then  one  of  them  must  be  made  basic  in  a  manner 
which  reduces  f(x)  .  A  new  nonbasic  free  variable  will  thus 
be  introduced: 

df/du.  =  u.  ,  say. 

An  expression  for  f(x)  in  which  the  linear  terms 
contain  no  free  variable  is  said  to  be  in  "standard  form". 

To  begin  with,  there  exist  no  free  variables  at  all.  The 
following  types  of  transformations  can  arise: 

(i)  introduction  of  a  new  nonbasic  free  variable, 
in  exchange  for  a  sign-restricted  one  (first  cycle  in  the 
following  example) ;  it  can  be  proved  that  the  new  variable 
will  not  appear  in  the  linear  term. 

(ii)  exchange  of  two  sign-restricted  variables 
(second  cycle  in  the  example);  a  linear  term  in  a  free 
variable  can  appear,  but  then  it  will  lead,  by  our  rule,  to 

(iii)  the  introduction  of  a  new  free  variable, 
not  in  a  linear  term,  and  to  the  disappearance  of  a  free 
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variable  which  had  been  in  a  linear  term  (third  cycle  in 
the  example) . 

As  long  as  there  are  linear  terms  in  free  variables, 
steps  of  type  (iii)  will  be  made.  Hence,  after  a  finite 
number  of  steps,  we  reach  again  a  standard  form.  But  there 
exist  only  a  finite  number  and  therefore  after  a  finite 
number  of  steps,  we  reach  a  minimum. 

This  extension  of  the  Simplex  method  to  the 
quadratic  programming  problem  is  credited  to  Beale  (1955); 
it  is  simple  in  principle  and  easy  to  execute  by  hand  compu¬ 
tation  for  the  smaller  problems.  This  review  and  the  follow¬ 
ing  example  is  based  on  the  work  by  Vajda  (1962) . 

§  5.31.01  Example 

Let  us  now  consider  the  example  first  introduced 
in  §  5.31  in  greater  detail;  i.e. 

\  xi  1  16  -12  x,  7 

min.  (-44,-42)  (^)+i(x1,x2)(_12  34  )  (x^)|  2x1+x2+x^=10 ,  x>oC  . 

The  gradient  of  f(x)  is  given  by 

g(x)  =  Vf(x)  =  (_42)  +  (_12  34) ()  . 

The  solution  of  this  particular  problem  is  obtained  in  three 
cycles  as  follows : 

(i)  introduction  of  a  new  non-basic  variable  u-^ 
in  exchange  for  a  sign-restricted  x-^, 

(ii)  exchange  of  two  sign-restricted  variables 

x2  and  X3, 

(iii)  introduction  of  a  new  free  variable  u2 
and  the  disappearance  of  a  free  variable  u^ . 
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The  choice  of  variable  which  Is  to  be  Introduced  Into  the 
basis  is  determined  by  the  composition  of  the  linear  term. 

We  choose  as  our  initial  basis  x^  =  10  ;  i.e.  x-^  =  xn  =  0  . 

(i)  An  examination  of  the  linear  term  with 
components  p^  =  -44,  p0  =  -42  reveals  that  both  are  negative. 
We  choose  -  the  choice  is  arbitrary  -  to  increase  the 
variable  x^  until  either  g^  =  df/dx-^  becomes  zero  or  the 
constraint  2x-^  +  x2  +  =  10  is  violated,  whichever  occurs 

first.  Here  the  first  condition  holds  for  a  smaller  x1  . 

We  introduce  a  new  free  variable  into  the  basis  by  adding 
an  additional  constraint  to  our  problem;  i.e. 

u^  =  6f/6x1  =  -22  +  8x-^  -  6x2  j 

and  increase  x-^  until  u^  =  0  . 

The  interchange  problem  (i.e.  introduction  of  a 
sign-restricted  variable  into  the  basis  in  exchange  for  a 
free  variable)  denoted  by 

u1  =  -22  +  8x^  -  6x2  , 

x^  =  10  -  2x^  -  x^  , 

f (x1,x2)  =  (-22  +  8x1  -  6x2)  x1 

+  (-21  -  6x1  +  17x2)  x2 

+  0  -  22x^  -  21x2 

is  conveniently  represented  by  the  tableau 
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X1 

X2 

b 

-8 

6 

-22 

2 

1 

10 

8 

-6 

-22 

-6 

17 

-21 

-22 

-21 

0 

Up) 

U2) 

(1) 


The  variables  to  be  interchanged  are  x-^  and  . 
The  above  tableau  illustrates  this  choice  by  a  negative 
coefficient  of  x-^  in  the  last  row  (the  linear  term)  and  the 
smallest  positive  ratio  of  the  quantities  in  the  b  and  x^ 
columns  of  each  row,  i.e.  min.  (-22/8  ,  10/2)  .  We  observe 

that  x^  is  held  fixed  at  zero.  Notice  that  in  the  same  way 
as  we  would  ignore  a  positive  coefficient  of  x-^  in  the 
constraint  x^  =  10  -  2x^  -  x^  ,  w e  could  ignore  a  negative 
coefficient  of  x-^  in  the  expression  Of/dx^  .  Either  case 
would  indicate  an  unbounded  value  for  f(x). 

The  interchange  of  variables  x-^  and  u-^  is 
accomplished  in  two  steps,  first  by  interchanging  inside 
the  brackets  using  the  Simplex  method,  second  by  interchang¬ 
ing  outside  the  brackets.  The  interchange  outside  the 

2 

brackets  is  accomplished  by  replacing  the  u^  term  by 
l/8u1/8x1  and  replacing  all  other  terms  in  the  bottom  part 
of  the  tableau  by  zero. 
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Thus 


X1 

X2 

b 

ui 

X2 

b 

U1 

X2 

b 

-8 

6 

-22 

U1 

-.125 

-.75 

2.75 

X1 

1 1 

2 

1 

10 

X3 

-25 

2.5 

4.5 

x3  ^ 

8 

-6 

-22 

(*]_) 

1 

0 

0 

Up) 

.125 

0 

0 

-6 

17 

-21 

(x2) 

-.75 

12.5 

-37.5 

(x2) 

0 

12.5 

-37.5 

-22 

-21 

0 

(1) 

0 

-37-5 

-60.5 

(1) 

0 

-37.5 

-60.5 

X3 

(u±) 

U2) 

(1) 


The  objective  function  can  now  be  written  as 

u.,  ,  .250  u 

l (u1,x2)=- 60. 5+ (0, -75) (x^) +2 (u1. x2) ( 0.25) ^ • 

We  note  that  there  is  no  linear  term  involving  u^  in  the 
objective  function. 

(ii)  An  analysis  of  the  linear  term  (  the  last  row) 
now  reveals  it  is  worthwhile  to  increase  x2  ^  up  to  the  smaller 
of  4. 5/2. 5  (when  x^  becomes  zero)  and  37-5/12.5  >  when 
8f/8x2  =  -37-5  +  12  .5  *2  becomes  zero.  The  smaller  of  these 
ratios  is  the  first,  so  that  we  interchange  the  two  sign- 
retricted  variables  x2  and  x^.  Thus 


ui 

X2 

b 

U1 

x3 

b 

U1 

x3 

b 

-.125 

-.75 

2.75 

xi 

-.05 

.30 

4.1 

X1 

.25 

2.5 

4.5 

X3  _ 

.10 

.  40 

1.8 

X2 

II 

.125 

0 

0 

(Up) 

.125 

0 

0 

(ux) 

0.25 

0.5 

1.5 

0 

12.5 

-37.5 

(x2) 

-1.25 

-5 

-15 

(x2) 

0.5 

2 

6 

0  - 

-37.5 

-60.5 

(1) 

3.75 

15 

-128 

(1) 

1.5 

6 

-155 

(ux) 

u3) 
(1)  . 
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The  above  Interchange  Is  accomplished  In  a  manner  similar  to 
(i)  and  will  not  be  reviewed  in  detail.  The  objective  function 
is  now  written  as 

u,  . 50  1  u 

f(u1,x3)=-155+(3,12)(xJ;)+i(u1Jx3)(  x  4)(xh  . 

We  observe  that  we  no  longer  have  a  "standard  form"  since  a 
free  variable  is  present  in  the  linear  term. 

(iii)  It  is  now  necessary  to  return  to  a  "standard 
form",  i.e.  we  decrease  u^  -  an  increase  is  undesirable  -  until 
either 

4.1  +  0.05u-^  =  0  ,  or 
1.5  +  0.25u1  =  0  . 

This  happens  the  first  time  when  u^  =  -6  ,  and  then  the  new 
variable 


u^  =  1.5  +  0.25u1  + 

equals  zero.  We  exchange  the  free  variables  u^  and  u^.  Thus 


U1 

x3 

b 

U2 

X3 

b 

U2 

X3 

b 

-.05 

.30 

4.1 

X1 

-.2 

.4 

3.8 

X1 

.10 

1 

1.8 

x2 

.4 

.2 

2.4 

X2 

11 

-.25 

-.5 

1.5 

u2 

-1 

2 

-  6 

u 

0.25 

.5 

1.5 

(ax) 

1 

0 

0 

(ux) 

4 

0 

0 

0.5 

2 

6 

(x3) 

2 

1 

3 

(x3) 

0 

1 

3 

1.5 

6 

-155 

(1) 

6 

3 

-164 

(1) 

0 

3 

-164 

Now  an  increase  of  X3  or  a  change  of  u^  cannot  reduce  the 
objective  function 


( ^2 ) 
(x3} 

(1)  • 


U, 


f  (u2,x3)=-  L64+(0,6)  (x^)  ,]- 


2 


(u 


2^3)  ( 


8  0 
0  2 
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and  we  have  reached  the  minimum  with  coordinates  x^  =  3.8  , 
x,,  =  2.4  and  function  value  f  =  -164. 

That  we  are  indeed  at  the  optimum  point  is  quite 
obvious  since  the  coefficient  of  x^  in  the  linear  term  is 
positive;  the  coefficient  of  u0  in  the  linear  term  is  zero; 
and  both  terms  in  the  quadratic  portion  of  f(x)  appear  in 
the  form  of  squares  with  positive  coefficients.  Clearly,  we 
cannot  increase  x^  or  vary  u^  without  increasing  the  value  of 
the  objective  function. 


§  3. 31.02  The  Basic  Computing  Algorithm 
We  define 


D  :  ( n+m+1 ) xl 


1  T 
-ip 

:  Ixn 

i  f(x) 

0 

2 

; 

i 

1 

1  l 

2° 

:  nxn 

i  2"  P  • 

!  nxl 

n 

1 

1 

n+1 

nt 

:  mxn 

1  b : mxl 

- 

J 

n+m 

X  <_ 

•H 

"x 

4 

-  X 

(j) 

i=l, . . . ,n  j=n+m+l 


i.e.  the  introduction  of  a  free  variable  in  exchange  for  a 
sign-restricted  one 

Z<^=^>  x(i)  x(  j) 

1=  :i+l,  .  „  . ,  n+m.  j=h+m+l 


i.e.  introduction  of  a  new  free  variable  in  exchange  for  and 
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removal  of  a  free  variable. 

f(x)  =  p]  x  +  -Jj  x^Cx  ;  g(x)  =  p  +  Cx  ;  x  :  nxl . 


comment 


0; 

i«- 

0 

i; 

i<- 

i+1 

i  : 

n  +1 

CM 

lit 

•X 

last  term? 

D(i): 

n+1 

,  -  1 

is  this  a 
free  variable? 

B(l,i)  : 

0 

1 - 1 

lit 

•\ 

is  coefficient 
zero? 

B ( n+m+2  ,  £ ) 

<-  -B(n+i+l,  £) 

£=1,2, . . . ,n 

B(n+m+2,n+l)  B(n+i+l,n+l)  introduce  free 

variable 


Z  <=> 


1 


2; 


3; 


4; 


il  <-  min  B(l,i) 
i 

B(l,il):  0 


i=lj ...jn 


is  sign  +  ? 


Aa 

il"“ 

Ab 

k 

A  <_ 


-B(il+lj  n+1) 
B(il+l,il) 


mm 


B(n+i,n+l)< 
B(n+i,  il)' 


/,b  ,b  \ 
mm  (V,,  X±1) 


X  <=> 

->  0 


Y  x(i)^=^x(j)  ;  i=n+l,  .  .  . ,  n+m,  j=n+m+l 


i.e.  the  exchange  of  two-sign  restricted  variables. 
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§  5-32  Method  of  Feasible  Directions 

Let  us  continue  our  study  of  the  quadratic  pro¬ 
gramming  problem  by  applying  the  method  of  feasible  directions. 
We  define  our  problem  as  follows: 

maximize  j  f (x) |  Ax  >  b  ,  x  >  oj  ,  where  f (x)  =  p^x  — x^Cx  ; 

C  :  nxn  symmetric,,  non-negative  definite,  A  :  mxn  ;  x  ,  p  : 
nxl  ,  b  :  mxl . 

The  method  of  feasible  directions  is  a  method  of 
steep  descent  and  consists  of  determining  a  sequence  of  feas¬ 
ible  solutions  with  ever-increasing  values  for  the  objective 
function.  At  each  major  cycle  of  our  procedure  we  have  to 
determine 

(i)  a  feasible  direction  for  which  the  objective 
function  increases  in  value, 

(ii)  the  length  of  the  step  to  be  taken  in  this 

direction. 

The  main  contribution  of  the  method  of  feasible 
directions  to  quadratic  programming  is  the  application  of  the 
Simplex  method  for  solving  the  direction-finding  problem. 

If  some  of  the  linear  constraints  defined  by  Ax  >  b 
are  equalities  instead  of  inequalities  we  can  either  eliminate 
some  of  the  variables  or  we  can  always  require  a^s  =  0 
instead  of  a.s  >0  for  the  equality  constraints  a.x  =  b.  and 
x  eR  .  Elimination  of  variables  reduces  the  size  of  the 
problem;  but  a  simple  form  of  A  and  f(x)  -  if  any  exists  - 
may  be  lost  . 

It  is  appropriate  at  this  point  to  review  the 
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notation  applicable  to  this  method.  The  function  e(x)  Is 
defined  as  e(x)  =  Ax  -  b  and  clearly  if  xbeR  then  e{x°)  >  0. 
A  hyperplane  is  defined  by  :  e^(x)  =  0  and  the  convention 
is  used  that  the  normal  to  the  hyperplane  points  "into"  the 
region  R.  The  "best  feasible"  direction  is  that  direction 
defined  by  the  unit  vector  s  and  the  gradient  vector 
gk  =  g(x°)  ,  i.e.  the  unit  vector  s  for  which 
ais  >  0  ,  if  e.(xb  =  0  , 

T 

ss=lj  (l) 

T  b  . 

s  g  is  a  maximum. 

We  assume  here  that  g^  /  0  ;  i.e.  x^  is  not  the  required 
solution. 

Since  we  are  interested  only  in  those  directions 

m  -u 

for  which  f(x)  increases  in  value,,  we  must  have  s  g  >  0  and 
hence  we  can  replace  (l)  by 

ais  >  0  ,  if  e^(x°)  =  0  „ 

sTs  <  1  ,  (2) 

T  b  . 

s  g  is  a  maximum. 

Problem  (2)  has  the  same  solution  as  (l)  whenever 

T  b 

max  (s  g  )  >  0  in  (l)  ;  and  it  has  a  solution  with  a  value 

T  b 

zero  whenever  max  (s  g  )  <  0  holds  in  (l).  The  nonlinear 
T 

constraint  s  s  <  1  in  (2)  can  be  replaced  by  the  weaker  set 
of  conditions  that  each  component  of  s  has  to  lie  between 
+  1  and  -  1  so  that  we  obtain 

ais  >  0  ,  if  e1(xb)  =  0  „ 

-1  <  s.  <  1  ,  all  1, 

T  b  . 

s  g  is  a  maximum. 


(3) 
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Although  the  solution  of  (3)  in  general  differs 

from  that  in  (2)  we  do,  however,  satisfy  the  condition 

T  b  T  b 

max  (s  g  )  >  0  in  both  cases.  If  max  (s  g  )  =  0  holds  in 

either  (2)  or  (3)  we  shall  not  obtain  a  direct  increase  in 

T_ 

f(x)  if  we  go  from  xD  in  the  direction  s  and  for  a  concave 

f(x)  it  follows  that  we  have  arrived  at  an  absolute  minimum. 

The  direction-finding  problem  will  now  be  deferred  until  we 

have  considered  a  few  more  concepts. 

Assuming  we  have  available  a  feasible  direction 

b  b  T  b 

s  which  satisfies  (g  )  s  >  0,  we  want  to  make  a  step  in 
that  direction,  so  large  that 

(i)  none  of  the  constraints  will  be  violated  by 

T_  T_ 

the  new  trial  solution  x  =  x°  +  Asu  ; 

b  b 

(ii)  f(xD  +  As0)  will  be  maximized  as  a  function 
of  A  under  the  condition  (i)  above.  There  is  a  restriction  in 
the  choice  of  A;  since 

e^(x)  =  e^(x^+Asb)  =  e^(x^)  +  Aa^s^  >  0 


must  hold,  we  must  choose  <  'h0  where 

),  for  those  i 


,b  ,  ,ehx0 

A  =  min  (- 


a^s 


for  which  a^sD  <  0  .  Hence  we  may  reformulate  the  problem  as 

max.  (f(xb+  Asb)  )  ,  for  fixed  x°  and  sb. 

0<A<Ab 


§  5*32.01  Explicit  Formulation  of  Method  of  Feasible  Directions 


The  quadratic  programming  problem  is  defined  as: 


d 

/■  ;  v  ■  •  i: 0 . ,  ; 

-  ■  sr’/inJ  op  trio".jo3-£io  J'jSitf 


maximize 


,  where 
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^f(x)|Ax>b  x>0 
f(x) 


nT  ..  T~ 

P  x  -  1  x  Cx  ; 

■2 


C  :  nxn  symmetric  non-negative  definite, 

A  :  mxn  ;  p  ,  x  :  nxl  ,  b  :  mxl  . 

Using  the  relation 

g(x)  =  p  -  Cx  ,  we  have 

|£(x+As)  =  g(x+As)Ts 
T  T 

=  g  s  -  As  Cs  ,  so  that 


df 


-^-(x+As)  =0  if  A  =  A  ,  defined  by 


T 


-Xa  To 

A  =  g  s 
T 

s  Cs 
T, 


(We  have  siCs  >  0  .  If  sxCs  =  0  then  we  put  Aa  =  H 


since  f(x)  is  then  linear  along  the  line  x  =  x^  + 
It  follows  that 
A  =  min  (Aa,A^)  where 

A^  =  min  /e.(x^)\  ,  for  those  i  for  which 

i  (  1  ) 

•  T  . 

I  a±s  | 

The  iteration  formulae  are  : 


k+1  k  .  k 

x  =  x  +  A^s  , 


k+1  k 

g  =  g 


A  Csk  ,  k  =  1,2, 


,n. 


_p  /  k\  ,  /  k\  T  k  —  -pr  ->  2  /  k\  Tn  k 
Af ( x  )  =A,  ( g  )  s  2  A  (s  )  Cs  , 


k 

1  . 

2  Ak 


k 

/  kvT,/  k+1 \ T  I  k 

(g  )  +(g  )  >s 


-L  -V  /  k\  T  k  •  JT>  -\ 

=  o  V  (s  )  s  ,  if  A  =  A 


k 


k 


00  , 

As .  ) 

T 

a.  s  <  0  , 

l 


and 
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The  procedure  in  general  is  not  finite  In  the 

quadratic  case.  It  can  easily  be  made  finite  by  using  the 

theory  for  solving  linear  equations  as  developed  by  Hestenes 

and  Stiefel  (1952) .  Clearly,  the  problem  of  finding  the 

unrestricted  maximum  of  the  strictly  concave  quadratic 

T  IT 

function  f(x)  =  p  x  -  x  Cx  is  equivalent  to  solving  the 
linear  system  Cx  =  p  . 

Let  x°  and  an  initial  direction  s°  define  a 
second  approximation  x^  of  the  solution  a  by  means  of 

1  o  ,  -v  o  ... 

x  =  x  +  with 

(  Os  T  O 

=  (g  )  S  O  „  O 

Ao  ' -  and  g  =  p  -  Cx 

f  OnL  o 


Next  we  choose  a  new  direction  s^  which  is  conjugate  to  s°. 


in  the  sense  that 


(s1)iCs°  =  0  . 


k 


Having  obtained  the  approximation  x  of 


.  -i  ^  k  n-  k 
cx  a 1 1  ci  p  u  o  g  =  p  -  0  x  , 


we  choose  a  new  direction  sk  which  is  conjugate  to  s°,  s\ 

k-1  , 

. . . ,  s  ,  so  that 


(i) 


(sk)TCsJ  =  0  (  j  =  0,1, . . ,,k-l)  , 

and  we  define  xk+H  xk  +  X,sk  with  g  =  (gkH  sk 

(sk)TCsk 

The  conditions  (l)  do  not  determine  sK  uniquely. 

One  method  of  choosing  the  vector  s  is  to  require  (g  )  s 
k  T  k  k 

maximum  and  (s  )  s  <  1  ,  (or-l<s<l)  .  This  method 
will  lead  to  the  solution  a  in  a  finite  number  r  <  n  of 
steps . 
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If  we  have  to  maximize  a  concave  quadratic  function 
T  IT 

f(x)  =  p  x  -  x  Cx  ,  subject  to  a  set  of  linear  inequal¬ 
ities  a^x  >  b^  (i=l, 2, . . . ,m) ,  the  procedure  of  Hestenes  and 
Stiefel  cannot  be  applied  without  modification  since  the 
point  a  /  R  (generally) .  The  modified  procedure  is  then 
as  follows:  we  assume  that  we  are  on  the  correct  set  of 
hyperplanes (the  set  in  which  the  maximum  lies);  and 
proceed  to  conjugate  successive  directions  s°,  s\  . .  .,  sk_\ 
either  until  we  arrive  at  a  step-length  defined  by  A^?  or 
until  we  arrive  at  a  maximum.  In  the  former  case  our 
assumption  is  false;  we  drop  the  additional  requirements 
since  we  are  not  working  on  the  correct  set  of  hyperplanes 
and  add  the  hyperplane  determined  by  A^13  to  our  set  of 
hyperplanes  on  which  we  assume  our  maximum  lies  and  add 
the  constraint  a,  s  >  0  since  e.(x)  =  0  .  We  now  start 
conjugation  of  successive  directions  s0,  s^,  .  .  .  ,  s^~\ 

as  before. 

§  3.32.02  Determination  of  Optimum  Feasible  Directions 

The  direction-finding  problem  is  defined  as 

B  s  >  0  , 

sTs  <  1  ,  (1) 

T  . 

g  s  is  a  maximum, 

B  :  mxn  ;  s,g  :  nxl  .  The  matrix  B  has  n  columns  and  m 
rows  corresponding  to  constraints  in  which  the  trial 
solution  lies. 

T 

If  max  (g  s)  >  0  we  may  normalize  and  restate 


128  - 


our  direction-finding  problem  as 
B  s  >  0  , 

gTs  =  1  ,  (2) 

T 

s  s  is  a  minimum. 

This  problem  can  now  be  solved  as  a  standard  quadratic 
programming  problem  but  the  following  special  technique 
is  available. 

According  to  the  solution  criterion  of  Kuhn  and 

Tucker  (1951)  the  gradient  at  the  optimum  point  must  be 

a  non-negative  linear  combination  of  the  outward-pointing 

normals  in  that  point,  i.e.  an  optimal  vector  s  of  (l) 

must  also  satisfy  the  relations 

T 

g  =  -B  u  +  2£s  ,  where  u  are  Lagrange  Multipliers 
uTBs  =0  (3) 

u  >  0  ; 

u  :  mxl  ;  £  ;  a  scalar.  The  above  result  is  also  immediately 

available  from  the  Parkas  Theorem  (§  4.32.01)  if  we  replace 

A  by  B  .  Hence  a  normal  may  not  be  taken  into  account  if 

the  optimum  point  does  not  lie  in  the  corresponding  con- 

T 

straints,  so  that  u.  =  0  holds  if  b . s  <  0  ,  i.e.  u.b.s  =  0, 

i  i  i  i 

where  b^  is  the  i-th  row  of  B. 

T 

Multiplying  g  =  -B  u  +  2£s  by  -B  and  defining 
2£Bs  =  v,  we  obtain 

BBTu  -  v  =  -Bg  * 
uTv  =  0  ,  (4) 

u  >  0  ,  v  >  0  ;  v  :  mxl  . 

Having  found  such  vectors  u  and  v,  we  obtain  2ps  by  the 


|  ,  -  :  v  ;  . ..  -  •  -  ' 

...tnioq  ;/ a  .  /  at  ciLiuH'toa 
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,T 


relation  2f3s  =  g  +  B  u  ,  and  s  by  normalizing  although 
this  normalization  is  not  necessary  in  the  direction-find¬ 
ing  problem.  The  event  (3  =  0  (hence  ps  =  0)  leads  to 
T  T 

g  s  =  -u  Bs  =  0  ,  so  that  we  have  then  arrived  at  a 
stationary  point. 

Consider  (4)  and  start  with  u  =  0  ,  v  =  Bg  . 

If  v  >  0  holds,  we  have  the  required  solution;  other¬ 
wise,  we  take  a  negative  v  .  and  carry  out  a  transformation 

J 

of  the  Dual  Simplex  algorithm  by  taking  the  diagonal 

element  in  that  row  as  the  pivot;  we  thus  satisfy  the  rela- 
T 

tion  u  v  =  0  automatically  throughout  the  calculation. 
Cycling  can  be  prevented  in  this  procedure  by  choosing  the 
negative  right-hand  member  in  a  special  simple  way.  If 
we  are  dealing  with  equalities  instead  of  inequalities  the 
corresponding  yds  and  hence  also  the  v^'s  have  to  be  zero; 
the  corresponding  u^’s  are  no  longer  sign-restricted  in 
this  case.  We  can  thus  start  by  eliminating  the  v^'s 
involved  from  the  basis  (replace  them  by  the  corresponding 


Again,  the  starting  point  of  the  direction¬ 


finding  problem  can  be  described  by  the  tableau: 


u 


1  .  .  . 


u 


m 


bls  V1 


BB 


,T 


0  .  .  . 


0 


. 


. 


c.  .  i  ■'  ......  O' Iv  ■  .  fee  fl  ■.■is 
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where  bj  is  the  j-th  row  of  B. 

§  5-33  Gradient  Projection  Method 

Let  us  define  our  programming  problem  as : 

\  m  7  TIT 

maximize  sf(x)  |  N  x  <  b  ,  x  >  0  J>  *  where  f(x)=p  x+^x  Cx)  , 

and  the  set  of  m  vectors  defined  by  N  —  [  nn  ,  n~,  .  .  .  ,  n  1 

J  m  -  1  r  2  mJ 

T 

is  normalized  so  that  nq  nj_  =  1  *  1=1, 2,..., m  ;  C  :  nxn 

symmetric,  non-negative  definite,  N  :  nxm  ;  p,  x,  n-^ ,  n^,  .  .  .  , 

n  : nxl, b : mxl . 
m 

The  gradient  projection  method  starts  with  a 

feasible  point  x°e  R,  where  R  is  defined  by  the  constraints 
T 

N^x  <  b,  and  proceeds  in  a  direction  defined  by  the  gradient 
of  the  objective  function  projected  into  the  space  which 
is  the  intersection  of  all  hyperplanes 

Q  :  H±  :  n?  x  -  b±  =  0  ,  i  =  1,  2,  . . . ,  q, 
which  are  currently  satisfied  as  equalities;  if  this  pro¬ 
jection  is  not  positive,  then  either  the  optimum  has  been 
reached  (and  criteria  are  given  to  determine  whether  this 
has  happened),  or  one  of  the  constraints  is  ignored  and  we 
proceed  along  the  component  of  the  gradient  of  the  remain¬ 
ing  intersection  space.  A  positive  projection  of  the 
gradient  will  increase  the  value  of  the  objective  function 
and  we  proceed  along  it. 

The  contents  of  this  chapter  are  divided  into 
four  sections.  In  section  one  we  consider  the  calculational 
problems  involved  in  the  gradient  projection  method. 

During  an  optimization  calculation  it  is  necessary  to  form 

T  t  - 1 

a  square  matrix  N  N  and  the  inverse  matrix  (N  N)  from  N 


p 


a  matrix  whose  columns  are  a  set  of  independent  vectors. 

A  new  Inverse  obtained  by  adding  or  deleting  one  column 
of  N  is  usually  required  at  each  step.  The  computing 
algorithm  is  used  freely  to  describe  the  recursive  rela¬ 
tions  which  give  the  new  inverse  and  corresponding  pro¬ 
jection  matrix.  The  step-size  problem  is  also  reviewed 
in  this  first  section. 

In  sections  2  and  3  we  define  the  basic  com¬ 
puting  algorithm  for  the  gradient  projection  and  solve  an 
elementary  example.  In  the  last  section  we  consider  some 
difficulties  which  may  arise  in  carrying  out  the  calcula¬ 
tions  of  the  preceding  sections;  these  difficulties  in¬ 
clude  linear  dependence  of  constraints,  rounding  error, 
solution  improvement,  and  the  application  of  the  gradient 
projection  method  to  the  linear  programming  problem.  In 
preparing  this  chapter,  we  have  made  use  of  earlier  work 
by  Rosen  (i960) . 

§  6.33.01  Preview  of  Calculations 

In  the  course  of  an  optimization  calculation 
using  the  Gradient  Projection  method  it  is  necessary  to 
obtain  the  projection  of  the  gradient  vector  on  various 
intersections  Q.  Each  new  intersection  will  be  determined 
by  a  set  of  hyperplanes  which  differs  from  the  previous 
set  by  one.  For  example,  one  of  the  hyperplanes  may  be 
dropped  from  the  set;  a  hyperplane  may  be  added  to  the  set; 
a  hyperplane  may  be  dropped  and  another  one  added;  and 
finally  the  set  may  remain  unchanged.  We  may  note  that 


there  appears  to  be  a  formal  similarity  between  an  inter¬ 
change  of  hyperplanes  in  the  gradient  projection  method  and 
an  interchange  of  variables  in  the  Simplex  method.  For  the 
purposes  of  the  following  discussion  on  the  calculational 

problems  involved  in  the  Gradient  Projection  method  we  shall 

T  -1 

assume  that  the  inverse  matrix  (N  N  )  is  known. 

v  q  q' 

The  recursive  relations  used  which  permit  a 

T  - 1 

hyperplane  to  be  dropped  from  (N^N^)  with  approximately 
2 

q  multiplications,  and  a  hyperplane  to  be  added  with  approx¬ 
imately  3qn  multiplications  are  based  on  the  formula  for  the 
inverse  of  a  matrix  in  terms  of  the  inverses  of  certain  of 
its  partitions.  In  particular,  suppose  a  square  matrix  A  is 
partitioned  as 


A  =  [ 


1 — 1  « 

< 

A2 

A3 

a4 

] 


(1) 


where  A-^  and  A^  are  square  matrixes  and  A A^  are  rectangular 
matrices  with  the  appropriate  number  of  rows  and  columns. 

Then  if  both  A^  and 

A0  =  A4  -  A3Aj1A2  (2) 

are  nonsingular,  the  inverse  of  A  exists  and  is  given  by 


A 


-1 


B  = 


y 


(3) 


where 


IV 
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For  computational  efficiency  we  note  that  the  desired 


-1 


quantities  should  be  obtained  in  the  specific  order,  A-^  , 


A-^A"1,  AQ,  B^,  B^,  B2,  and  B1 . 
readily  available  and  given  by 


An 


-1 


expression  for  A1  is 


A. 


=  B-, 


b2b4  b3  . 


(5) 


1  "  ""I 

The  relation  (5)  will  be  applied  to  the  following 

problem.  Given  a  set  of  q  linearly  independent  hyperplanes 

and  the  corresponding  inverse  matrix  (NqNq)  ,  to  drop  one  of 

the  hyperplanes  from  the  set  and  to  obtain  the  corresponding 

inverse  matrix  for  the  (q-l)  hyperplanes.  In  particular,  we 

assume  (N^N^)  is  known  and  that  is  to  be  dropped  and 
T  - 1 

(N  N  ,)  obtained,  where 

H  T. 


Let 


N  —  [  n,  , n0,  .  .  .  ,n  ]  . 

q-l  —  L  1*  2’  q-l  J 

tT 


A  =  N  N 
q  q 

A.  =  NT  ,  N  , 
1  q-l  q-l 


A 


(6) 

qxq  , 

(q-l)x(q-l) 


(7) 


- 1  T  —  1 

We  assume  that  B  =  A  =  (N  N  )  is  known  and  therefore 

v  q  q' 

that  the  partitions  B^,  B^,  B^,  and  Bq  of  B  are  also  known. 

In  particular,  Bq  is  a  symmetric  (q-l)x(q-l)  matrix,  B^  a 

T 

(q-l)-  dimensional  column  vector,  B^  =  B^  ,  and  Bq  a  scalar. 

T  - 1  - 1 

The  required  inverse  (N  _-^N  _^)  =  A^  is  then  given  by  (5). 


If  is  to  be  dropped,  the  relation  (5)  applies  if  the  £- th 
and  q-th  row  and  column  of  (N^N^)  are  interchanged  before 
it  is  applied. 

The  procedure  for  adding  a  hyperplane  to  the 
inverse  is  now  described.  For  this  purpose  it  is  assumed 


T  \  - 1 

that  (N  ,N  )  is  known  and  it  is  desired  to  add  H  to  the 
v  q-1  q-1'  q 

inverse  matrix.  It  follows  from  (7)  that 

m  m 

A2  "  VlV‘q  =  A§  ’ 


T 

Ai,  =  n  n  =  1 
4  q  q 


(8) 


A0  is  a  (q-1)-  dimensional  column  vector,  and  A^=l  follows 
from  the  normalization  of  the  hyperplanes.  From  (2),  (7), 
and  (8)  it  follows  that 

A  =  nTn  -  N  (NT  -.N  -1)“1NT 
o  q  a  q-lv  q-1  q-1'  q-1 


q  q 

r 

‘q' 


q-1  q- 

=  "q^1  *  Nq-l^q-lV^'X-F 


q-i 7  nq 


T 

=  n  P  n 
q  q-1  q 


(9) 


It  can  be  shown  that  A  is  also  defined  by 

o 

Ao  =  (Pq-lnq)  ^Pq-lnq^  Pq-lnq‘_ 


(10) 


A  necessary  and  sufficient  condition  that  Aq  >  0  ,  is  that 
Hq  is  linearly  independent  of  the  intersection  H^, i=l, 2, . . . , 
q-1  .  Assuming  this  to  be  the  case, (4)  gives 

T 


B1  =  <Nq-lVl)_1  + 

B2  =  Ao  rq-l  =  B3 

Bi.  =  A”'*'  ,  where 

4  o  3 

-1  -  <Nq-lNq-P~X-lnq)  . 


q-1  x  q-l  q 
We  also  note  that 


(ii) 


(12) 


P  ,  n  =  n 
q-1  q  q 


N  q  r  . 
q-1  q-1 


(13) 


A  very  useful  recursive  relation  is  available 


T  -  _L 

between  P  ,  P  ,  and  n  .  Again  let  (N  N  )  =B  as  in  (3) 


-i 
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and  the  B  ,  1=1, 2, 3,4,  as  given  In  (11).  Comparing  (10)  and 

(6),  the  matrix  N  can  be  written  N  =  [N  -,,n  ]  and  similarly 
v  '  ’  q  Q  q-1  q 

for  its  transpose.  From  its  definition,  T?  is  therefore 
given  by 


Pq  ■  tVl’V  B  “q-1, 

L  T  -I 
n 

q 


(14) 


After  carrying  out  the  matrix  multiplications  and  sub¬ 
stituting  for  B  from  (11),  the  following  is  obtained, 

P  =  P  ,  +  A-1 ( P  1  n  )(P  1n  )T  .(15) 
q  q-1  o  v  q-1  q' v  q-1  q'  v  ' 

Let  u  denote  the  unit  vector  parallel  to  P  _-.n  , 

Q  Q.  Q. 


u  =  P  n  n 
q  q-1  q 


P  i  n 
q-1  q 


(16) 


Subtracting  both  sides  of  (15)  from  I  :  nxn  unit  matrix  gives 

T 


P  =  P  ,  -u  u' 
q  q-1  q  q 


(17) 


The  recursion  relations  which  have  just  been 
described  also  make  it  possible  to  build  up  the  matrices 
(N^  N^)  and  P^  for  a  set  of  H  linearly  independent  vectors 
u.  ,  i  =  1  ,  2  ,  with  a  minimum  of  computation. 

The  problems  of  adding  and  subtracting  a  hyper- 
plane  from  (N^N^)  and  the  construction  of  the  associated 
projection  matrix  will  now  be  described  by  computing 
algorithms . 

Computing  Algorithm  -  Adding  a  Hyperplane 

The  following  algorithm  performs  the  operations 

defined  as: 


( 


(NTN  )  1 

v  q  q' 


q 


(NT  N  . )  1 
v  q-1  q-1 ' 


q-1 


)  • 
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For  purposes  of  ease  of  description  we  also  define 


(  B  >  [ 


1 — 1 

CVI 

b3 

:  b4 

N^-1)  . 


^  Nq-lnq 

r  n  +-  (NT  nN  n  )_1t 
q-1  v  q-1  q-1' 


t  <-  n  -N  -i  r  -] 
q  q-1  q-1 

T 

A  <-  tXt 
o 


B4-  1/K 


Br 


-Bi.r  , 
4  q-1 


T 

B3  <-B2 


-1 


B1  ^q-A-N'Vq-l 


u 


q 


q 


b4  t 


P  -u  u 
q-1  q  q 


T 


END 


t : (q-1) xl 


t :  nxl 


Aq:  scalar, 

A  >  0 
o 


B2: (q-l)xl 


u  :  nxl 

q 


P  :  nxn 

q 


Computing  Algorithm  -  Subtracting  a  Hyperplane 

The  operations  performed  by  this  algorithm  are 

defined  as: 


( 


K-iVN'1 

Pq-1 


Kv1 


) 


(Nq-lVh’1-®!  -  (B^  B2B2T 

A2  ~  Nq-1  nq 


A0: (q-l)xl 
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Computing  Algorithm  -  Building  the  Projection  Matrix 

The  purpose  of  this  algorithm  is  to  construct 
from  H  linearly  independent  normalized  vectors 

E  [n1>n2> . . . ,ng  ] 

the  following: 

and  ^ 


x 


..a  j.  j-  o=  A  "■!' 


1'  !'  r'j3 
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During  an  optimization  calculation  we  must  always 
solve  the  step-size  problem.  We  can  assume  that  we  have 
available  the  gradient  of  f(x)  i.e.,  g(xk)  =  p  +  Cxk  and 
the  projection  matrix  P  .  The  necessary  and  sufficient  con¬ 
ditions  (theorem  2)  that  xk  be  a  constrained  global  maximum 
are  Pq  g(xk)  =  0,  and 

r  =  (N^Nq)_1Nq  g(xk)  <  0  j  r  :  qxl  . 

If  we  are  not  at  a  maximum  we  test  r  and  drop  that  hyperplane 

defined  by  \  i 

max  <r.>  >  0 

i  (  ^ 

from  the  projection  matrix.  In  this  way  we  obtained  a 
revised  projection  matrix  P.  The  manifold  Q  has  a  dimension¬ 
ality  of  one  (a  line)  at  least.  We  define  the  unit  vector 

s  =  Pq  g(xk)/  |  Pq  g(xk)  |  . 

T  k 

It  follows  that  Nqs  =  0  ,  so  that  x  =  x  +  As  is  in  Q  for 

any  A  .  By  assumption,  xk  lies  on  only  the  q  hyperplanes  in 

Q,  ,  i=l,  2,  .  ..,  q,  and  since  it  is  a  feasible  point, 

/  k\ 

e^(x  )  >  6  >  0,  i=q+l,  ...,  m  .  Then  for  each  of  the  remain¬ 
ing  m-q  hyperplanes,  PL,  i=q+l,  ...,  m,  there  may  exist  a 

value  A  =  A^  such  that  e^(x)  =  0  .  |  A^  |  is  the  distance 

k  k 

from  x  to  the  hyperplane  ;  the  distance  from  x  to  the 

hyperplane  H^is  along  a  parallel  to  s.  pn  particular,  we  shall 

define  A^  as  the  minimum  quantity  chosen  from  the  set  of  A.. 

which  are  positive, 

i.e.  A13  =  mins  A^  =  e^(xk)/s^n^  >  5  >  0  (  , 

T  1  j  b 

if  s  n.  >  0  and  i  =  q+1,  ...,  m  .  The  distance  A  represents 
the  largest  step  that  can  be  taken  from  xk  in  the  direction 
s  without  leaving  R  (the  region  defined  by  the  constraints 


\*o  ‘-lileb 


;  "  .;,o  a  s  II  ■■  ,  ...  . ... 
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N^x  <  b) 
m  —  ' 


A  further  restriction  on  the  step-length  Is  that 


we  want  to  maximize  f(xk+  As)  as  a  function  of  A  ,  i.e 


-^(x+As)  =  (Pg(xk+As)  )Ts 

m  rn 

=  g  Ps-As  CPS  , 


and  for  a  maximum  step-size  we  have 


Aa  = 


T  T 

5  s/s  Cs  ,  since  Ps  =  s  . 


Hence,  the  step-size  used  is  given  by  A  = 


min  (A^A13) 


Computing  Algorithm  -  Step-size  Problem 
We  wish  to  find  A  defined  by: 


i; 


A  =  min  (Aa,A^) 

d 


A 


A 


A 


t 

s 

t 

b 

a 

i 

b 

i 

b 

b 

b 

b 

b 


A 


Pg(xkf 
t/  |  t  | 
Cs 
sTt 
gTt/b 

q 

L 

i+1 

T 

n^s 


0 

i 

0 

b 


e±(xk)/-b 


J 

m 


A  min(Aa,A^) 


> 


< 


< 


1 


comments 


we  choose  L  very  large 


END 


. 


■'  i::o  ,  -.r.  ,  •  •  :? 
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§  5-33.02  The  Basic  Computing  Algorithm 

k 

For  every  feasible  point  x  which  is  not  a  global 

maximum,  another  feasible  point  can  be  found  with  a  larger 

value  for  f (x)  .  The  following  algorithm  is  given  for  an 

k  k 

arbitrary  feasible  point  x  e  R.  We  assume  that  x  lies  on 

the  manifold  Q  formed  by  the  intersection  of  q  linearly 

independent  hyperplanes.  For  purposes  of  this  algorithm  we 

define  the  following:  „  T 

f  (x)<=>  p  x  -  ^  x1Cx  , 

*  (NqV\  \ 

*  *  *  > ^ 

T 


I  -  NqBNq 


i; 


3; 


p-Cx 


g  :  0 

t  * 
r  * 
t  * 


T 

N  g 

qe 

Bt 


p  s 

max< 

L 


A 


P 


0 

0 

0 


Kv 


-l 


2; 


q-1  q 

A  min(Aa,  A^) 


x 

A 


x+As 


A' 


END  x  a  global  max? 


,  3 

<  « 


,  -  END 


,  i  1 


remove  a 
hyperplane 


define  step-size 

determine  next 
approximation 


•xn-.w.  a  ;  b 

(  ;  <?, 
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(NqNq)  1  (Nq-lNq-l)  1  add  a  hyper,p!ane 

P  <-  p  _ 

q  q-1 
->  1 

_ END_ _ 

§  3.33.03  Example  of  Gradient  Projection  Method 

The  procedures  and  algorithms  defined  in  the 
previous  sections  will  now  be  applied  to  the  following 
quadratic  programming  problem: 

maxj  f(x)  =  pTx  +  -^x^Cx  |  N^x  <  b,  x  >  0  |  ; 

where  f(x)  =  (3,0,4) (x,)  +  ^  (xpx  ,xJ(  4  -2  1)  (x-.  )  , 

Xp  d  -2  3  0  Xp 

x^  1  0  5  x^ 

100  -1//2  -100 
and  Ny  —  [010  -1//2"  0-10]. 

1  0  0  1  0  0  0-1 

We  note  that 

3  x  4-21 

P  =  (  0  )  ,  x  =  (  xl  )  ,  C  =  (  -2  30)  ; 

4  xi  10  5 

C  :  3x3  ,  symmetric  non-negative  definite.  We  shall  start 
with  the  initial  feasible  point  (x°)T  =  (0 . 1, 0 . 1, 0 . 1 )  .  The 
initial  point  x°  is  feasible  and  does  not  lie  on  any  hyper¬ 
planes  since 

e^(x°)  =  n?x°  -  b^  >  0  ,  i  =  1,  2,  .  ..,  7  . 

Clearly,  since  the  initial  point  is  feasible  and  not  con¬ 
strained  to  lie  on  any  hyperplanes  we  have  an  initial  gradient 
vector  g(x°)  =  p  +  Cx°  ,  an  initial  projection  matrix  PQ  = 

:  3x3  identity  matrix,  and  a  unit  direction  vector 


■  - 


142  - 


s  =  Pg°/  |  Pg°  |  =  g°/  |  g°|  .  Evaluating  the  required 

functions  at  the  initial  value  (x°)T  =  (0. 1, 0. 1, 0. l)  we 
obtain  the  following: 

o  o  3‘3  100  *58 

g  =  g(x  )  =  (  0.1  )  ,  P  =  (  0  1  0  )  ,  and  s  =  (  .02  )  . 

4.6  °  001  .81 

This  particular  problem  will  be  solved  in  three  cycles. 

Each  cycle  will  start  with  a  test  for  the  optimum  point  and 
end  with  the  definition  of  a  new  projection  matrix. 

(i)  To  start,  the  point  x°  is  not  the  required 
optimum  point  since 


PoS 


e°  /  o  . 


We  now  need  to  determine  the  minimum  step-size 


defined  by 


We  recall  that 


A  =  min  (Aa,AD)  . 


=  gTs/  s^Cs  ,  A13  =  min  |  A^  =  e^(x°)/s^n^  >  6  >  0  j  , 


where 


e^(xu)  =  nTx°  -  b^  ,  and  only  those  i  are  considered  for 
T 

which  s  n^  >  0  and  i  =  q  +  1,  . ..,  m.  In  our  case  we  must 
test  for  every  A^  ,  i.e.  i  =  1,  2,  ...,  7  •  We  obtain 


A*5  =  min 


N 


A 


7 


1.11 


Hence  our  step-size  problem  has  the  solution 
A  =  min  (  5-69  ,  1.11^  )  , 

where  the  subscript  7  is  used  to  denote  the  result  that 
hyperplane  will  need  to  be  added  to  our  projection  matrix, 

The  next  approximate  solution  is  given  by 

i  n  0-1  -58  .74 

x-1  =  x  +  As  =  (  0.1  )  +  1.11  (  .02  )  =  (  .12  )  . 

0.1  .81  1.00 


■  v;  w©  L  ' '  'rid-  nJt.scMo 


v  wo;;  :  »; 
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T  o 

Since  the  constraint  e^.  =  nyx  -  tu  determined  the  step-size 

we  include  in  the  projection  matrix.  For  convenience  we 

interchange  constraints  1  and  7,  i.e. 

000  -1//5  -101 
N7  =  [  010  -1//2  0-10] 

'  -101  0  000 

The  intersection  Q  now  consists  of  and  the  projection 
matrix  P-^  is  given  by 


P1  = 


-  n1n1 


T 


10  0 

=  (010) 
0  0  0 


The  following  diagram  depicts  the  constraint  upon  which  we 
continue  our  search  for  the  optimum  point  and  the  path 
followed  in  reaching  the  constraint. 


(ii)  We  again  follow  the  basic  computing  algorithm 

and  test  for  the  optimum  point,  i.e. 

n  i  4.56  4.56 

g  =  g(xx)  =  (  -1.12  )  ,  P.g1  =  (  -1.12  )  ,  r,  =  -9.74  . 

9.74  0  * 

Clearly,  we  are  not  at  the  optimum  point  and  furthermore  we 

cannot  drop  our  hyperplane  from  the  projection  matrix  since 


<  0  . 


i 
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The  step-size  problem  is  now  solved  and  we  obtain 


A  =  min  (  6. 63,  . 01^)  . 

The  next  approximate  solution  is  given  as 

P  ,  .74  .97  .75 

X  =  x  +  As  =  (  .12  )  +  .01  (  -.24  )  =  .12  . 

1.00  .00  1.00 

Again  we  interchange  constraints  2  and  6  obtaining 

000  -1/72  -101 
Ny  =  [  0  -1  0  -I/n/2-  0  10]  . 

'  -101  0  000 

The  intersection  Q  now  consists  of  H-^  and  and  the  pro¬ 
jection  matrix  P2  becomes 

0  0  0 

P  =  (  0  1  0  )  . 

^  0  0  0 

The  following  diagram  now  describes  the  constraint 
(now  a  straight  line  defined  by  the  intersection  of  H-^  and  Hp) 
upon  which  we  continue  our  search  for  the  optimum  point. 


(iii)  Prom  the  basic  computing  algorithm  we 


.-i  •  C  -JO  ;  o, 
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obtain 

p  p  6.76  P  0 

g  =  g(x^)  =  (  -1.14  )  ,  Ppg^  =  (  -1.14  )  ,  r,  =  -9.75  . 

9.75  0  1 

Again  we  continue  our  search  by  solving  the  step-size  problem., 
obtaining 

A  =  min  (  5.65,  . 12g  ) 

S  2 

Our  next  approximate  solution  becomes  x  =  x  +  As 

.75  0  .75 

=  (  .12  )  +  .12  (  -1  )  =  (  0  )  . 

1.00  0  1.00 

We  interchange  constraints  3  and  6  and  construct 

0  0  0 

P,  =  (  0  0  0  )  . 

3  0  0  0 

Clearly,  we  are  now  at  the  optimum  point  with 
T 

coordinates  x  =  (  ,75j  0,  1.00  )  since 

P3  g(x3)  =  0  ,  and 
r£  =  -1.50  <  0  . 

§  5.33.04  Further  Considerations  on  the  Method 

In  this  -  the  last  of  the  four  sections  mentioned  in 
the  introduction  to  this  chapter  -  section  we  consider  some 
important  additions  to  the  gradient  projection  method  which 
greatly  increase  the  practical  value  of  the  method.  These  are 

(i)  a  starting  procedure  from  a  nonfeasible  point, 

(ii)  the  problem  of  linear  dependence  on  constraints 

(iii)  a  correction  procedure  for  improvement  of  the 
solution  required  because  of  build  up  of  rounding  errors,  and 
finally 

(iv)  an  evaluation  of  the  gradient  projection  method 
applied  to  the  linear  programming  problem. 


!  ■  '  ■ 


■i>-  '  '  '-■  ■  ■  ..  ■ '  ’’  ■'  11  ::  l  t  L ;  -:'i>  ;‘IC"^p;u. 
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(l)  We  first  consider  the  problem  of  a  starting 
procedure  for  an  arbitrary  nonfeasible  point  x°  which  is 
defined  by  e(x°)  <  0  ,  e  :  qxl  ,  (  we  assume  that  at  least  one 
constraint  is  violated,  i.e.  e^(x°)  <  0  ,  for  some  i  ,  i~.l  , 

2,  .  .  . ,  q)  .  For  simplicity,  we  assume  that  the  planes  are 
ordered  so  that 


\  <  0  (i  =  1,  2,  .  .  . ,  q) 

ei  j 

L  >  0  ( i  =  q+1,  .  .  .  ,  m) 

and  that  the  q  hyperplanes  f-L,  i=l,  2,  ...,  q,  are  linearly 

independent.  We  define  the  qxl  vector 


e(x)  -  <j  e±(x) ,  .  .  . ,  eq(x); 


By  definition,  we  have  that 


1 


e(x)  =  ITx  -  bq  ;  bq  :  qxl 


Let  x 

Hence  from  (l)  we  have 


=  -  jyN^r1  e(x°) 


1 


elx1)  =  e(x°)  -  NqNq(NqNq)"  e(x°)  =  0 


(1) 

(2) 

(3) 


Thus  x  ,  computed  in  (2),  lies  in  all  q  of  the  selected  hyper¬ 
planes.  If  in  addition  we  have 


e  -^x1)  > 


0 


(i=q+l, . . . ,m) 


1 


then  x  is  the  desired  starting  point. 

If  at  least  one  of  the  quantities  e^(xq)  ,  i=q+l, 

. . . ,  m,  is  negative,  a  minimum  of  these  is  chosen,  say  i=q+l  . 

The  projection  P  n  of  n  , on  the  intersection  Q  of  H.  , 

^  u  q  q+1  q+1  ^  l  J 

i=l,  ...,  q,  and  the  corresponding  r  :  qxl  are  then  obtained 


from 


r  =  (NqNqrXnq+1)  , 


Pn  ,  n  =  n  -  Nr  . 
q  q+1  q+1  q 


rr.)l.  ■:  ..."  ■ 
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There  are  now  three  relevant  possibilities. 

1 .  |  pq  nq+1  |  =  0  ,  r±  <  0  ,  i=l,  . q.  In  this 

case  there  is  no  feasible  solution,  since  it  can  be  proved 
that  no  step  is  possible  in  a  direction  to  increase  eq+1(x) 
without  violating  at  least  one  of  the  constraints  in  Q. 


2-  I  Vq+1 


0  .  r^>  0,  for  at  least. one  i,  say  i=l 


In  this  case  drop  n^  and  add  nq+-^  to  Nq,  giving  Nql=[n2, . . . , nq+1 ] . 
It  can  be  shown  that  N  1  N  1  is  not  singular.  Also,  define 
e  l(x^)  :  qxl  where 


eqlU  }  = 


0,  .  .  .^e^yx1)] 


A  point  x  is  then  obtained  from 


T 


x2  =  x1  -  Nql(Nql  Nql) -1  e^x1)  . 


It  can  be  shown  that 


ex(xd)  >  0  , 

ei(x2)  =  0  ,  (i=2, . . . ,q+l)  . 

3-  1  Pqnq+ll>0-  “  Vl  t0  V  glVlng  Nq+l=[nP "  ”Vl] 


and  obtain  [Nq+qNq+1)  by  the  recursion  relation, 
Also  define  eq+]_(x)  :  q+lxl  where 


'q+1 


(xx)  E  |0'---'0'eq+l(xl) 


1\  - 


2  2 
A  point  x  ,  satisfying  e^(x  )  =  0,  i=l,  ...,  q+1,  is  then 


given  by 


2  1  ,T  /  ,  T  T  ,  T  N  —  1  /  1  \ 

x  =  x  -  Nq+1(Nq+1  Nq+1)  eq+1(x  )  . 


Repeated  applications  of  the  above  procedure  will 
either  obtain  a  feasible  point  using  (2)  or  (3),  or  show  that 
no  such  point  exists  as  in  (l) . 

(ii)  At  this  point  we  have  assumed  that  all  hyper¬ 
planes  used  were  linearly  independent.  In  general,  this  will 
not  be  the  case  either  in  the  application  of  the  gradient 


'  : }  a  ’  s  O'  a  t : :  s  ; ; ::  oq  i  i'  o  us  o 
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projection  method  or  in  other  applications  where  the  projection 

matrix  P  or  the  inverse  (N  N)  is  needed.  Given  an  arbitrary 

set  of  £  vectors  n^  ,  the  recursion  relations  give  a  practical 

method  of  picking  a  largest  linearly  independent  subset  of 

these,  i.e.,  a  subset  of  £'  vectors  which  span  the  £'  subspace 

of  the  given  set  of  £  vectors.  If,  e.g.  in  (11)  we  find 

that  P  n  =  0  ,  then  we  delete  n  from  the  set  of  £  vectors. 
q-1  q  q 

It  can  be  shown  that  this  process  if  continued  gives  the 
desired  subset  of  £' <  £  vectors;  the  matrix  P^  obtained  in 
this  way  takes  any  vector  into  the  intersection  of  the  set  of 
all  the  original  £  hyperplanes,  Rosen  (i960) . 

(iii)  A  correction  procedure  may  be  needed  in  the 
gradient  projection  method  due  to  the  build  up  of  rounding 
errors  in  the  inverse  matrix  (NqNq)  .  These  errors  will 
tend  to  increase  as  hyperplanes  are  dropped  from  and  added 
to  the  inverse,  particularly  when  one  or  more  of  the  hyper¬ 
planes  in  the  inverse  is  close  to  being  linearly  dependent. 

As  a  result,  the  projected  unit  gradient  vector  s  may  have 
small  components  which  do  not  lie  in  the  desired  intersection 
Q.  As  a  result  it  may  happen  that  after  taking  a  step  along 
s  and  adding  a  new  hyperplane  to  Q  the  new  point  x°  may 
violate  some  of  the  constraints  which  define  the  intersection 


Q. 

The  q  hyperplanes  tL,  1=1,  2,  ...,  q,  define  the 

intersection  Q  and  we  suppose  that  the  corresponding  inverse 
matrix  (N^N^)  has  already  been  computed  as  part  of  the 
method.  The  q  values  of  e^(x°)  ,  i=l,  ...,  q,  are  computed. 


. 

■  ■  ":x  'to  j'V  J  .0:.vi..!  ■  ■' ! ; 

■-  -i-'C  1  ■  -  '  .'tv.;  '>  1  .  V 

r  ao 1  .  3 a1,  v  :  .  .iJ  iii  .  ■ 
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These  would  all  be  zero  except  for  the  effect  of  rounding 
error,  and  therefore  they  will  all  be  relatively  small.  Using 
e(x°)  as  defined  a  point  x^  is  obtained  from 

x1  =  x°  -  N  (N^N  )-1  e(x°)  ,  where 

q  q  q 

e(x)  =  NTx  -  b 

v  '  q  q 

If  there  is  no  error  in  (N^N^)  then  it  follows 
that  e^(x  )  =  0  ,  i=l,  q,  and  x  is  the  desired  correct¬ 

ed  point  which  satisfies  all  the  constraints.  It  can  be  shown 
that  if  |  e(x°)  |  <  6  then  |  e  (xb)|  <  5^  .  This  correction 

procedure  is  therefore  essentially  an  error-squaring  process. 
Since  the  inverse  and  e(x°)  are  computed  as  part  of  the  regular 

method,  the  correction  procedure,  when  needed,  is  obtained 

2 

with  less  than  2n  additional  multiplications.  Another  ap¬ 
proach  to  removing  the  effect  of  rounding  error  is  to  construct 

a  new  value  for  (N  N  )  and  P  from  N  —  [nn , n0, . . . , n  ]  using 

v  q  q'  q  q  —  1  1’  2’  ^  qJ  & 

the  appropriate  algorithm. 

(iv)  A  useful  comparison  can  be  made  between  the 
efficiency  of  the  gradient  projection  method  applied  to  the 
linear  programming  problem  and  the  elementary  Simplex  method. 
Let  us  define  the  linear  programming  problem  as 
\  T  T 

maximize  <px  |  Nx>b,x>0 
N  :  nxm  ;  x  ,  p  :  nxl  ,  b  :  mxl  . 

T 

Clearly,  the  linear  objective  function  f(x)  =  p  x 

has  a  constant  gradient  given  by  g(x)  =  p  .  Furthermore,  the 

step-size  will  always  be  determined  by 

Aib  =  min  ^ e . (xk)/  T  >  5  >  0 

i  L  i 
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T 

if  s  n^  >  0  and  1  =  q  +  1,  . . . ,  m. 

o 

At  a  vertex  x  €  R  using  gradient  projection  q  =  m 

T  - 1 

and  the  matrix  Nm  is  square,  so  that  (NmNm)  Nm  =  Nm  .  Further¬ 
more,  for  any  g,  Pmg  =  0  .  The  necessary  and  sufficient  con¬ 
ditions  for  a  maximum  thus  reduce  to 

r  =  N^P  <  0  • 

For  x°  not  optimal  in  gradient  projection,  the  vector 

(where  r^  is  the  maximum  positive  component  of  r)  is  dropped 

from  N  and  the  vector  to  be  introduced  is  chosen  from  those 
m 

n^  not  in  Q  defined  by  A?  .  It  can  be  shown  that  the  gradient 

projection  and  the  Simplex  method  are  identical  in  the  choice 

as  to  which  vector  is  dropped  from  and  which  vector  is  added 

to  the  basis  at  each  step.  Therefore,  the  two  methods  will 

follow  the  identical  vertex  to  vertex  path  for  a  linear 

problem  started  at  a  vertex. 

For  a  vertex  to  vertex  path  in  gradient  projection 
T  - 1 

the  inverse  matrix  (N  N  )  must  be  altered  at  each  step. 

v  m  nr  ^ 

This  is  done  by  means  of  the  recursion  relations  previously 

2 

defined  and  requires  approximately  4m  multiplications  per 

step.  In  Simplex  only  the  matrix  has  to  be  altered, 

2 

which  requires  approximately  3m  multiplications  per  step.  On 
this  basis  it  is  to  be  expected  that  Simplex  will  require  less 
computing  time  per  step  but  since  the  gradient  projection 
method  tends  to  cut  across  the  interior  of  the  convex  region 
rather  than  always  go  from  vertex  to  vertex  the  net  result 
will  be  that  although  Simplex  is  somewhat  faster  for  most 
linear  problems,  there  will  be  certain  types  of  linear 


. 
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problems  for  which  gracfl.er.it  projection  will  be  faster ,  pro¬ 
vided  that  we  are  not  at  a  vertex. 
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CHAPTER  VI 

RESULTS  AND  CONCLUSIONS 

The  purpose  of  this  thesis  was  to  review  and  analyse 
critically  the  techniques  available  for  the  numerical  solution 
of  nonlinear  systems,  to  optimize  constrained  nonlinear  func¬ 
tions,  and  to  develop  practical  methods  for  high-speed  com¬ 
puters.  In  our  approach  to  these  problems  we  assume  that  a 
digital  computer  is  available  to  perform  the  drudgery  of 
computations  invariably  involved  in  any  method  of  solution. 
This  chapter  is  subdivided  into  three  sections  as  follows: 

(i)  a  discussion  of  the  approach  used  in  this 

thesis , 

(ii)  a  condensed  review  of  material  covered  and 
conclusions  arrived  at,  and 

(iii)  an  outline  of  the  remaining  problems  in  the 
field  of  nonlinear  systems  and  nonlinear  programming. 


§  6.1  The  Approach  Used  in  the  Thesis 

The  two  or  several  variable  problem  is,  in  general, 
a  non-trivial  extension  of  the  single  variable  problem.  An 
example  of  this  is  the  great  difficulty  of  generalizing  Rolle's 
theorem  to  more  than  one  variable. 

Most  natural  processes  depend  upon  several  indepen¬ 
dent  variables  -  contrary  to  the  simplifying  assumptions  made 
by  engineers  and  scientists  -  and,  furthermore,  these  variables 
are  generally  not  separable.  Before  the  advent  of  modern 
automatic  computing  equipment  the  labour  of  computing  with 
even  a  single  independent  variable  was  considerable  and, 
with  many  variables,  prohibitive.  The  very  crudity  of  existing 
methods  is  adequate  evidence  of  a  groping  and  iterative  ap¬ 
proach  to  the  numerical  solution  of  these  problems;  the 
difficulties  recounted  in  this  thesis  are  by  no  means  trivial 
and  It  seems  unlikely  that  these  problems  will  be  completely 
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solved  in  the  near  future.  Certainly  an  improvement  of  tech¬ 
niques  should  Increase  the  range  of  problems  which  can  be 
handled  or  significantly  reduce  the  effort  necessary. 

Prohibitive  labour  is  perhaps  an  adequate  reason  but 
another  -  and  more  significant  -  reason  in  the  lag  in  the 
development  of  numerical  methods  of  coping  with  multivariate 
functions  is  the  relative  unfamiliarity  of  the  mathematics 
of  several  independent  variables.  Much  of  the  existing 
mathematics  applicable  to  problems  in  several  variables  is 
framed  in  highly  abstract  terms;  hence.,  it  is  not  immediately 
accessable  to  those  numerical  analysts  who  have  entered  the 
field  from  engineering  or  the  physical  sciences.  This  second 
reason  has  greatly  influenced  the  approach  followed  in  this 
thesis.  We  have  devoted  a  major  part  of  this  thesis  to  topics 
not  directly  applicable  to  the  nonlinear  system  and  constrain¬ 
ed  nonlinear  functions  but  these  topics  are  required  to  form 
the  necessary  background. 

§  6.2  Review  of  Thesis  and  Conclusions 

Before  we  attempted  to  solve  the  nonlinear  system 
in  several  variables  an  attempt  was  made  to  prepare  an 
adequate  background  in  the  algebraic  problems.  We  mentioned 
the  further  difficulties  that  are  encountered  in  the  poly¬ 
nomial  case  because  of  ill-conditioning  which  could  be  ad¬ 
equately  handled  only  by  resorting  to  high-precision  arith¬ 
metic.  The  iterative  methods  applicable  to  the  linear  system 
are  included  because  of  the  importance  of  the  extension  of 
these  methods  to  nonlinear  problems. 


; 


The  nonlinear  system  defined  as 
f(x)  =  0  ;  f(x)  ,  x  :  nxl 

is  handled  most  effectively,  in  general,  by  the  Conjugate 
Gradient  Method.  The  other  methods  reviewed,  such  as  Newton's, 
Steepest  Descent,  and  Iterative  methods  can  either  not  be 
applied  because  of  stringent  conditions  on  convergence  or 
they  converge  very  slowly  in  the  region  of  a  solution.  The 
Conjugate  Gradient  method  is  applicable  to  computer  program¬ 
ming  and  its  property  of  quadratic  convergence  near  a  station¬ 
ary  value  makes  it  the  most  suitable  general  purpose  method 
available.  The  use  of  the  projection  matrix  in  solving  the 
conjugation  of  successive  vectors  problem  has  been  tested  and 
found  to  be  a  practical  approach  to  this  problem.  Experience 
in  using  this  method  suggests  that  ultimate  computing  ef¬ 
ficiency  is  obtained  through  using  low  precision  arithmetic 
in  the  early  stages  of  the  method  and  then  increasing  the 
precision  as  we  approach  a  solution. 

The  more  difficult  problem  of  nonlinear  programming 
was  held  in  abeyance  until  the  basic  concepts  and  theorems 
in  linear  programming  had  been  introduced.  The  only  methods 
described  for  the  linear  programming  problem  defined  as 

\  p  ) 

maximize  <  p  x  |  Ax  <  b  x  >  0  >  ;  A:  mxn  ; 

x  ,  p  :  nxl  ,  b  :  mxl  , 

were  the  Simplex  and  Dual  Simplex  methods  although  more 
powerful  methods  are  now  available.  The  notation  applicable 
to  the  linear  problem  was  extended  to  the  nonlinear  problem 


as,  e.g.  the  quadratic  case: 

\  T  IT 

maximize  <  p  x  -  tt  x  Cx  Ax  <  b  ,  x  >  0 

l  2 

C  :  nxn  symmetric,  non-negative  definite,  A  :  mxn  ;  p  ,  x  : 
nxl  ,  b  :  mxl  .  The  methods  reviewed  were  Beale's  extension 
of  the  Simplex  method,  the  method  of  Feasible  Directions, 
and  the  Conjugate  Gradient  method.  The  preferred  method  and 
the  method  programmed  for  the  I.B.-M.  1620  computer  is  the 
Conjugate  Gradient  Method. 

§6.3  Remaining  Problems 

In  Chapter  III  we  attempted  to  pose  the  problem  of 
solving  the  nonlinear  system  and  to  partially  appraise  the 
method  of  solution  currently  available.  We  noted  that 
intuition  and  experience  were  invaluable  assets  in  matching 
the  most  effective  method  to  the  particular  problem.  A 
mathematical  result,  urgently  required,  which  would  place  the 
known  numerical  methods  of  solution  on  a  firmer  basis  is  a 
workable  criteria  for  the  existence  and  localization  of  roots 
of  nonlinear  systems  in  several  variables.  A  further  result 
which  would  be  of  immense  benefit  in  the  solution  of  the  non¬ 
linear  system  is  a  practical  method  for  a  "best"  polynomial 
approximation  to  multivariate  functions  associated  with  e.g. 
a  least  squares  fit. 

In  Chapter  V  our  treatment  of  the  nonlinear  pro¬ 
gramming  problem  was  restricted  to  the  particular  case  of 
quadratic  programming.  Another  equally  important  but  more 
difficult  case  is  convex  programming  -  the  problem  of 
minimizing  a  convex  function  (or  maximizing  a  concave 
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function)  in  a  convex  region.  The  methods  available  when 
the  constraints  are  linear  are  the  extension  of  Feasible 
Directions  by  Zoutendijk  (1959)  and  Gradient  Projection  by 
Rosen  (1961).  An  interesting  approach  to  the  convex  pro¬ 
gramming  problem  is  the  Cutting-plane  method  by  Kelley  (i960) . 
The  basic  idea  behind  this  method  is  to  construct  local 
linearizations  of  the  nonlinear  constraints  and  then  solve 
the  simpler  problem.  The  solution  is  then  improved  by  more 
accurate  linearizations  to  the  constraints  in  the  neighbour¬ 
hood  of  the  solution.  The  convexity  assumption  plays  an 
important  role  in  these  problems;  without  such  an  assump¬ 
tion  the  existing  methods  either  do  not  work  or  they  lead  at 
best  to  a  local  optimum.  Kuhn  and  Tucker  (1951)  showed  that 
the  method  of  Lagrange  Multipliers  can  be  extended  in  such  a 
way  that  a  solution  criterion  can  be  given  in  terms  of  a 
saddle  value  problem  and  further  developments  using  this 
approach  is  available  in  Arrow,,  Hurwicz,  and  Uzawa  (1958). 
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